Utilizing Deep Object Detector for Video Surveillance Indexing and Retrieval

Durand, Tom; He, Xiyan; Pop, Ionel; Robinault, Lionel

doi:10.1007/978-3-030-05716-9_41

Tom Durand^19,20,
Xiyan He¹⁹,
Ionel Pop¹⁹ &
…
Lionel Robinault¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Included in the following conference series:

International Conference on Multimedia Modeling

2190 Accesses
1 Citations

Abstract

Intelligent video surveillance is one of the most challenging tasks in computer vision due to high requirements for reliability, real-time processing and robustness on low resolution videos. In this paper we propose solutions to those challenges through a unified system for indexing and retrieval based on recent discoveries in deep learning. We show that a single stage object detector such as YOLOv2 can be used as a very efficient tool for event detection, key frame selection and scene recognition. The motivation behind our approach is that the feature maps computed by the deep detector encode not only the category of objects present in the image, but also their locations, eliminating automatically background information. We also provide a solution to the low video quality problem with the introduction of a light convolutional network for object description and retrieval. Preliminary experimental results on different video surveillance datasets demonstrate the effectiveness of the proposed system.

Supported by Foxstream: http://www.foxstream.fr.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Awad, G., Snoek, C.G.M., Smeaton, A.F., Quénot, G.: Trecvid semantic indexing of video: a 6-year retrospective. ITE Trans. Media Technol. Appl. 4(3), 187–208 (2016)
Article Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
Article Google Scholar
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Fularz, M., Kraft, M., Schmidt, A., Niechciał, J.: The PUT surveillance database. In: Choraś, R.S. (ed.) Image Processing and Communications Challenges 7. AISC, vol. 389, pp. 73–79. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-23814-2_9
Chapter Google Scholar
Girshick, R.B.: Fast r-cnn. In: ICCV, pp. 1440–1448. IEEE Press, Santiago (2015)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE Press, Venise (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Press, Las Vegas (2016)
Google Scholar
Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6), 797–819 (2011)
Article Google Scholar
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34(3), 334–352 (2004)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456. JMLR.org (2015)
Google Scholar
Jung, H., Choi, M.K., Jung, J., Lee, J.H., Kwon, S., Jung, W.Y.: Resnet-based vehicle classification and localization in traffic surveillance systems. In: CVPRW, pp. 934–940. IEEE Press, Honolulu (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc., Lake Tahoe (2012)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2999–3007. IEEE Press, Venise (2017)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Luo, Z., et al.: MIO-TCD: a new benchmark dataset for vehicle classification and localization. IEEE Trans. Image Process. 27, 5129–5141 (2018)
Google Scholar
Ning, G., et al.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: ISCAS, pp. 1–4. IEEE Press, Baltimore (2017)
Google Scholar
Podlesnaya, A., Podlesnyy, S.: Deep learning based semantic video indexing and retrieval. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2016. LNNS, vol. 16, pp. 359–372. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56991-8_27
Chapter Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788. IEEE Press, Las Vegas (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR, pp. 6517–6525. IEEE Press, Honolulu (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R.B., Sun, J.B.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2015)
Article Google Scholar
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)
Article MathSciNet Google Scholar
Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: CVPR, pp. 2246–2252. IEEE Press, Ft. Collins (1999)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9. IEEE Press, Boston (2015)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826. IEEE Press, Las Vegas (2016)
Google Scholar
Ueki, K., Kobayashi, T.: Object detection oriented feature pooling for video semantic indexing. In: VISIGRAPP, pp. 44–51. SciTePress (2017)
Google Scholar
Wang, Z., Chang, S., Yang, Y., Liu, D., Huang, T.S.: Studying very low resolution recognition using deep networks. In: CVPR, pp. 4792–4800. IEEE Press, Las Vegas (2016)
Google Scholar
Xu, Z., Hu, J., Deng, W.: Recurrent convolutional neural network for video classification. In: ICME, pp. 1–6. IEEE Press, Seattle (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Foxstream, 69120, Vaulx-en-Velin, France
Tom Durand, Xiyan He, Ionel Pop & Lionel Robinault
INSA-Lyon, 69621, Villeurbanne cedex, France
Tom Durand

Authors

Tom Durand
View author publications
You can also search for this author in PubMed Google Scholar
Xiyan He
View author publications
You can also search for this author in PubMed Google Scholar
Ionel Pop
View author publications
You can also search for this author in PubMed Google Scholar
Lionel Robinault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ionel Pop .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Durand, T., He, X., Pop, I., Robinault, L. (2019). Utilizing Deep Object Detector for Video Surveillance Indexing and Retrieval. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-05716-9_41
Published: 11 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics