Scene-adaptive accurate and fast vertical crowd counting via joint using depth and color information
- 552 Downloads
Reliable and real-time crowd counting is one of the most important tasks in intelligent visual surveillance systems. Most previous works only count passing people based on color information. Owing to the restrictions of color information influences themselves for multimedia processing, they will be affected inevitably by the unpredictable complex environments (e.g. illumination, occlusion, and shadow). To overcome this bottleneck, we propose a new algorithm by multimodal joint information processing for crowd counting. In our method, we use color and depth information together with a ordinary depth camera (e.g. Microsoft Kinect). Specifically, we first detect each head of the passing or still person in the surveillance region with adaptive modulation ability to varying scenes on depth information. Then, we track and count each detected head on color information. The characteristic advantage of our algorithm is that it is scene adaptive, which means the algorithm can be applied into all kinds of different scenes directly without additional conditions. Based on the proposed approach, we have built a practical system for robust and fast crowd counting facing complicated scenes. Extensive experimental results show the effectiveness of our proposed method.
KeywordsMultimodal joint multimedia processing Crowd counting Ordinary depth camera Scene-adaptive scheme Real time system
This work was supported by the China National Funds for Distinguished Young Scientists under Grant No.60925010, Natural Science Foundation of China under Grant No.61272517, The Research Fund for the Doctoral Program of Higher Education of China under Grant No.20120005130002, the Co-sponsored Project of Beijing Committee of Education, the Funds for Creative Research Groups of China under Grant No.61121001, and the Program for Changjiang Scholars and Innovative Research Team in University under Grant No.IRT1049.
- 1.Antic B, Letic D, Culibrk D, Crnojevic V (2009) K-MEANS based segmentation for real-time zenithal people counting. In: IEEE international conference on image processing, pp 2565–2568Google Scholar
- 4.Chateau T, Gay-Belille V (2006) Real-time tracking with classifiers. In: IEEE European conference on computer vision, pp 218–231Google Scholar
- 5.Cong Y, Gong HF, Zhu SC, Tang YD (2009) Flow mosaicking: real-time pedestrian counting without scene-specific learning. In: IEEE conference on computer vision and pattern recognition, pp 1093–1100Google Scholar
- 6.Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, vol 1, pp 886–893Google Scholar
- 7.Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: IEEE conference on computer vision and pattern recognition, pp 304–311Google Scholar
- 8.Fu HY, Ma HD, Liu L (2011) Robust human detection with low energy consumption in visual sensor network. In: IEEE international conference on mobile ad-hoc and sensor networks, pp 91–97Google Scholar
- 12.Mikolajczyk K, Schmid C, Zisserman A (2004) Human detection based on a probabilistic assembly of robust part detectors. In: Pajdla T, Matas J (eds) European Conference on Computer Vision, vol 3021. Berlin, Heidelberg, pp 69–82Google Scholar
- 13.Mu Y, Yan S, Liu Y, Huang T, Zhou B (2008) Discriminative local binary patterns for human detection in personal album. In: IEEE conference on computer vision and pattern recognition, pp 1–8Google Scholar
- 15.Shimada A, Arita D, Taniguchi R (2006) Dynamic control of adaptive mixture of gaussians background model. In: IEEE international conference on video and signal based surveillance, pp 20–24Google Scholar
- 16.Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 246–252Google Scholar
- 18.Velipasalar S, Tian YL, Hampapur A (2006) Automatic counting of interacting people by using a single uncalibrated camera. In: IEEE international conference on multimedia and expo, pp 1265–1268Google Scholar