Skip to main content

Missing Recover with Recurrent Neural Networks for Video Object Detection

  • Conference paper
  • First Online:
Big Data (Big Data 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 945))

Included in the following conference series:

Abstract

Despite recent breakthroughs in object detection with static images, extending state-of-the-art object detectors from image to video is challenging. The detection accuracy suffers from degenerated object appearances in videos, e.g., occlusion, video defocus, motion blur, etc. In this paper, we present a new framework called Missing Recover Recurrent Neural Networks (MR-RNN) for improving object detection in videos, which captures temporal information to recover the missing object. First, We detect objects in consecutive frames to obtain the bounding boxes and their confidence scores. The detector is trained for every frame of the video. Then we feed these detections into a Recurrent Neural Network (LSTM [8] or BiLSTM [4]) to capture temporal information. This method is tested on a large-scale vehicle dataset, “DETRAC”. Our approach achieves Average Precision (AP) of 68.90 based on SSD detector, an improvement of 2.68 over the SSD detector. Experimental results show that our method successfully detects many objects which are missed by basic detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)

    Google Scholar 

  2. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  3. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)

    Article  Google Scholar 

  4. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)

    Article  Google Scholar 

  5. Han, W., et al.: Seq-NMS for video object detection. arXiv preprint arXiv:1602.08465 (2016)

  6. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23

    Chapter  Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Kang, K., et al.: Object detection in videos with tubelet proposal networks. In: Proceedings of CVPR, vol. 2, p. 7 (2017)

    Google Scholar 

  10. Kang, K., et al.: T-CNN: tubelets with convolutional neural networks for object detection from videos. In: IEEE Transactions on Circuits and Systems for Video Technology (2017)

    Google Scholar 

  11. Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 817–825 (2016)

    Google Scholar 

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  13. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  14. Milan, A., Rezatofighi, S.H., Dick, A.R., Reid, I.D., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: AAAI, vol. 2, p. 4 (2017)

    Google Scholar 

  15. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  16. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  18. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  19. Tripathi, S., Lipton, Z.C., Belongie, S., Nguyen, T.: Context matters: refining object detection in video with recurrent neural networks. arXiv preprint arXiv:1607.04648 (2016)

  20. Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. arXiv preprint arXiv:1511.04136 (2015)

  21. Xiao, F., Lee, Y.J.: Spatial-temporal memory networks for video object detection. arXiv preprint arXiv:1712.06317 (2017)

  22. Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 3 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shen, R., Wang, W., Zhang, S., Tang, J. (2018). Missing Recover with Recurrent Neural Networks for Video Object Detection. In: Xu, Z., Gao, X., Miao, Q., Zhang, Y., Bu, J. (eds) Big Data. Big Data 2018. Communications in Computer and Information Science, vol 945. Springer, Singapore. https://doi.org/10.1007/978-981-13-2922-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2922-7_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2921-0

  • Online ISBN: 978-981-13-2922-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics