Crowd Counting from a Still Image Using Multi-scale Fully Convolutional Network with Adaptive Human-Shaped Kernel

Cao, Jinmeng; Yang, Biao; Zhang, Yuyu; Zou, Ling

doi:10.1007/978-3-319-92753-4_19

Jinmeng Cao¹⁴,
Biao Yang¹⁴,
Yuyu Zhang¹⁴ &
…
Ling Zou¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10799))

Included in the following conference series:

Pacific-Rim Symposium on Image and Video Technology

1121 Accesses

Abstract

Crowd count estimation from a still crowd image with arbitrary perspective and density level is one of the challenges in crowd analysis. Techniques developed in the past performed poorly in highly congested scenes with several thousands of people. To resolve the problem, we propose a Multi-scale Fully Convolutional Network for robust crowd counting, that is achieved through estimating density map. Our approach consists of the following contributions: (1) an adaptive human-shaped kernel is proposed to generate the ground truth of the density map. (2) A deep, multi-scale, fully convolutional network is proposed to predict crowd counts. Per-scale loss is used to guarantee the effectiveness of multi-scale strategy. (3) Several attempts, e.g. de-convolutional and minimizing per-scale loss, are tried to improve the counting performance of the proposed approach. Our approach can adapt to not only sparse scenes, but also dense ones. In addition, it achieves the state-of-the-art counting performance in benchmarking datasets, including the World Expo’10, the UCF_CC_50, and the UCSD datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ryan, D., Denman, S., Sriharan, S., et al.: An evaluation of crowd counting methods, features and regression models. Comput. Vis. Image. Und. 130, 1–17 (2015)
Article Google Scholar
Gao, C.Q., Liu, J., Feng, Q., et al.: People-flow counting in complex environments by combining depth and color information. Multimedia Tools Appl. 75(15), 9315–9331 (2016)
Article Google Scholar
Luo, J., Wang, J., Xu, H., et al.: Real-time people counting for indoor scenes. Sig. Process. 124, 27–35 (2016)
Article Google Scholar
Rao, A.S., Gubbi, J., Marusic, S., et al.: Estimation of crowd density by clustering motion cues. Vis. Comput. 31(11), 1533–1552 (2016)
Article Google Scholar
Hashemzadeh, M., Farajzadeh, N.: Combining keypoint-based and segment-based features for counting people in crowded scenes. Inf. Sci. 345, 199–216 (2016)
Article Google Scholar
Siva, P., Shafiee, M.J., Jamieson, M., et al.: Scene Invariant Crowd Segmentation and Counting Using Scale-Normalized Histogram of Moving Gradients (HoMG). arXiv preprint arXiv:1602.00386 (2016)
Zhang, X., He, H., Cao, S., et al.: Flow field texture representation-based motion segmentation for crowd counting. Mach. Vis. Appl. 26(7–8), 871–883 (2015)
Article Google Scholar
Zhang, C., Li, H., Wang, X., et al.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)
Google Scholar
Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_38
Chapter Google Scholar
Hu, Y., Chang, H., Nian, F., et al.: Dense crowd counting from still images with convolutional neural networks. J. Vis. Commun. Image Representation 38, 530–539 (2016)
Article Google Scholar
Sourtzinos, P., Velastin, S.A., Jara, M., Zegers, P., Makris, D.: People counting in videos by fusing temporal cues from spatial context-aware convolutional neural networks. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 655–667. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_46
Chapter Google Scholar
Zhang, Y., Zhou, D., Chen, S., et al.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
Google Scholar
Marsden, M., McGuiness, K., Little, S., et al.: Fully Convolutional Crowd Counting On Highly Congested Scenes. arXiv preprint arXiv:1612.00220 (2016)
Zeiler, M.D., Ranzato, M., Monga, R.: On rectified linear units for speech processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3517–3521 (2013)
Google Scholar
Wang, T., Li, G., Lei, J., Li, S., Xu, S.: Crowd counting based on MMCNN in still images. In: Sharma, P., Bianchi, F.M. (eds.) SCIA 2017. LNCS, vol. 10269, pp. 468–479. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59126-1_39
Chapter Google Scholar
Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. arXiv preprint arXiv:1707.09605, pp. 833–841 (2017)
Liang, X., Wei, Y., Shen, X., et al.: Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015)
Chen, L.C., Yang, Y., Wang, J., et al.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2016)
Google Scholar
He, D., Yang, X., Liang, C., et al.: Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3519–3528 (2017)
Google Scholar
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–7 (2008)
Google Scholar
An, S., Liu, W., Venkatesh, S.: Face recognition using kernel ridge regression. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, vol. 130, pp. 1–7 (2007)
Google Scholar
Chen, K., Loy, C.C., Gong, S., et al.: Feature mining for localised crowd counting. In: BMVC, vol. 1, no. 2, p. 3 (2012)
Google Scholar
Chen, K., Gong, S., Xiang, T., et al.: Cumulative attribute space for age and crowd density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 124, pp. 2467–2474 (2013)
Google Scholar
Idrees, H., Saleemi, I., Seibert, C., et al.: Multi-source multi-scale counting in extremely dense crowd images. In: CVPR, vol. 31, no. 11, pp. 2547–2554 (2013)
Google Scholar
Rodriguez, M., Laptev, I., Sivic, J., et al: Density-aware person detection and tracking in crowds. In: IEEE International Conference on Computer Vision (ICCV), pp. 2423–2430 (2011)
Google Scholar
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp. 1324–1332 (2010)
Google Scholar

Download references

Acknowledgement

This work has been supported by the National Natural Science Foundation of China under Grant No. 61501060, the Natural Science Foundation of Jiangsu Province under Grant No. BK20150271, Key Laboratory for New Technology Application of Road Conveyance of Jiangsu Province under Grant BM20082061708.

Author information

Authors and Affiliations

Changzhou University, Changzhou, Jiangsu, China
Jinmeng Cao, Biao Yang, Yuyu Zhang & Ling Zou

Authors

Jinmeng Cao
View author publications
You can also search for this author in PubMed Google Scholar
Biao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Biao Yang .

Editor information

Editors and Affiliations

National Institute of Informatics, Tokyo, Japan
Shin'ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, J., Yang, B., Zhang, Y., Zou, L. (2018). Crowd Counting from a Still Image Using Multi-scale Fully Convolutional Network with Adaptive Human-Shaped Kernel. In: Satoh, S. (eds) Image and Video Technology. PSIVT 2017. Lecture Notes in Computer Science(), vol 10799. Springer, Cham. https://doi.org/10.1007/978-3-319-92753-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-92753-4_19
Published: 06 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92752-7
Online ISBN: 978-3-319-92753-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics