Missing Recover with Recurrent Neural Networks for Video Object Detection

Shen, Ranran; Wang, Wenzhong; Zhang, Shaojie; Tang, Jin

doi:10.1007/978-981-13-2922-7_19

Ranran Shen¹³,
Wenzhong Wang¹³,
Shaojie Zhang¹³ &
…
Jin Tang¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 945))

Included in the following conference series:

CCF Conference on Big Data

1904 Accesses
1 Citations

Abstract

Despite recent breakthroughs in object detection with static images, extending state-of-the-art object detectors from image to video is challenging. The detection accuracy suffers from degenerated object appearances in videos, e.g., occlusion, video defocus, motion blur, etc. In this paper, we present a new framework called Missing Recover Recurrent Neural Networks (MR-RNN) for improving object detection in videos, which captures temporal information to recover the missing object. First, We detect objects in consecutive frames to obtain the bounding boxes and their confidence scores. The detector is trained for every frame of the video. Then we feed these detections into a Recurrent Neural Network (LSTM [8] or BiLSTM [4]) to capture temporal information. This method is tested on a large-scale vehicle dataset, “DETRAC”. Our approach achieves Average Precision (AP) of 68.90 based on SSD detector, an improvement of 2.68 over the SSD detector. Experimental results show that our method successfully detects many objects which are missed by basic detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)
Article Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Article Google Scholar
Han, W., et al.: Seq-NMS for video object detection. arXiv preprint arXiv:1602.08465 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kang, K., et al.: Object detection in videos with tubelet proposal networks. In: Proceedings of CVPR, vol. 2, p. 7 (2017)
Google Scholar
Kang, K., et al.: T-CNN: tubelets with convolutional neural networks for object detection from videos. In: IEEE Transactions on Circuits and Systems for Video Technology (2017)
Google Scholar
Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 817–825 (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Milan, A., Rezatofighi, S.H., Dick, A.R., Reid, I.D., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: AAAI, vol. 2, p. 4 (2017)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tripathi, S., Lipton, Z.C., Belongie, S., Nguyen, T.: Context matters: refining object detection in video with recurrent neural networks. arXiv preprint arXiv:1607.04648 (2016)
Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. arXiv preprint arXiv:1511.04136 (2015)
Xiao, F., Lee, Y.J.: Spatial-temporal memory networks for video object detection. arXiv preprint arXiv:1712.06317 (2017)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 3 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, Hefei, China
Ranran Shen, Wenzhong Wang, Shaojie Zhang & Jin Tang

Authors

Ranran Shen
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shaojie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Tang .

Editor information

Editors and Affiliations

School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
Zongben Xu
Xidian University, Xi'an, China
Xinbo Gao
Xidian University, Xi'an, Shaanxi, China
Qiguang Miao
Chinese Academy of Sciences, Beijing, China
Yunquan Zhang
Zhejiang University, Hangzhou, China
Jiajun Bu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, R., Wang, W., Zhang, S., Tang, J. (2018). Missing Recover with Recurrent Neural Networks for Video Object Detection. In: Xu, Z., Gao, X., Miao, Q., Zhang, Y., Bu, J. (eds) Big Data. Big Data 2018. Communications in Computer and Information Science, vol 945. Springer, Singapore. https://doi.org/10.1007/978-981-13-2922-7_19

Download citation

DOI: https://doi.org/10.1007/978-981-13-2922-7_19
Published: 11 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2921-0
Online ISBN: 978-981-13-2922-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)