Abstract
Smart/Intelligent video surveillance technology plays the central role in the emerging smart city systems. Most intelligent visual algorithms require large-scale image/video datasets to train classifiers or acquire discriminative features using machine learning. However, most existing datasets are collected from non-surveillance conditions, which have significant differences as compared to the practical surveillance data. As a consequence, many existing intelligent visual algorithms trained on traditional datasets perform not so well in the real world surveillance applications. We believe the lack of high quality surveillance datasets has greatly limited the application of the computer vision algorithms in practical surveillance scenarios. To solve this problem, one large-scale and comprehensive surveillance image and video database and test platform, called Benchmark and Evaluation of Surveillance Task (abbreviated as BEST), is developed in this work. The original images and videos in BEST were all collected from on-using surveillance cameras, and have been carefully selected to cover a wide and balanced range of outdoor surveillance scenarios. Compared with the existing surveillance/non-surveillance datasets, the proposed BEST dataset provides a realistic, extensive and diversified testbed for a more comprehensive performance evaluation. Our experimental results show that, performance of seven pedestrian detection algorithms on BEST is worse than that on the existing datasets. This highlights the difference between non-surveillance data and real surveillance data, which is the major cause of the performance decreases. The dataset is open to the public and can be downloaded at: http://ivlab.sjtu.edu.cn/best/Data/List/Datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shu, C.F., Hampapur, A., Lu, M., Brown, L., Connell, J., Senior, A., Tian, Y.: IBM smart surveillance system (s3): a open and extensible framework for event based surveillance. In: IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 318–323 (2005)
Hampapur, A., Brown, L., Connell, J., Pankanti, S.: Smart surveillance: applications, technologies and implications. In: Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia, pp. 1133–1138 (2004)
Hu, W., Tieniu, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 34(3), 334–352 (2004)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database, pp. 248–255 (2009)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Ess, A., Leibe, B., Van Gool, L.: Depth and appearance for mobile scene analysis. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Enzweiler, M., Gavrila, D.M.: Monocular pedestrian detection: survey and experiments. IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2179–2195 (2009)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311 (2009)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 17th International Conference on Proceedings of the Pattern Recognition, (ICPR 2004), vol. 3, pp. 32–36 (2004)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Action as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 1395–1402 (2005)
Oh, S., Hoogs, A., Perera, A., Cuntoor, N.: A large-scale benchmark dataset for event recognition in surveillance video. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 3153–3160 (2011)
Over, P., Awad, G.M., Fiscus, J.G., Antonishek, B., Michel, M., Kraaij, W., Smeaton, A.F., Qunot, G.: TRECVID 2015 an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2015. NIST, USA (2015)
Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 688–703. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_45
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Scandinavian Conference on Image Analysis, pp. 91–102 (2011)
Dollr, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)
Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: Using k-poselets for detecting people and localizing their keypoints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3582–3589 (2014)
Dollar, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Nam, W., Dollr, P., Han, J.H.: Local decorrelation for improved detection. Adv. Neural Inf. Process. Syst. 1, 424–432 (2014)
Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Softw. Eng. 32(9), 1627–1645 (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
Wang, D., Zhang, C., Cheng, H., Shang, Y., Mei, L.: SPID: surveillance pedestrian image dataset and performance evaluation for pedestrian detection. In: 13th Asian Conference on Computer Vision Workshop on Benchmark and Evaluation of Surveillance Task (2016)
Acknowledgement
This work was partly funded by NSFC (No. 61571297, No. 61371146, No. 61527804, 61521062), 111 Program (B07022), and China National Key Technology R&D Program (No. 2012BAH07B01). The authors also thank the following organizations for their surveillance data supports: SEIEE of Shanghai Jiao Tong University, The Third Research Institute of Ministry of Public Security, Tianjin Tiandy Digital Technology Co., Shanghai Jian Qiao University, and Qingpu Branch of Shanghai Public Security Bureau.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhang, C., Ni, B., Song, L., Zhai, G., Yang, X., Zhang, W. (2017). BEST: Benchmark and Evaluation of Surveillance Task. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-54526-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54525-7
Online ISBN: 978-3-319-54526-4
eBook Packages: Computer ScienceComputer Science (R0)