Advertisement

Selective deep ensemble for instance retrieval

  • Zhengyan Ding
  • Lei Song
  • Xiaoteng Zhang
  • Zheng Xu
Article
  • 89 Downloads

Abstract

In public security systems, visual instance retrieval has an explosive growing requirement, especially for large-scale image or video databases. Due to its wide range of applications in surveillance scenario, this paper aims at the retrieval tasks centered around ‘vehicle’ and ‘pedestrian’ targets. Many previous CNN-based methods have not exploited the ensemble abilities of different models, which achieve limited accuracy since a certain kind of deep architecture is not comprehensive. On the other hand, some features in the original deep representation are useless for retrieval tasks, while the attention-aware compact representation will be much more efficient and effective. To address the above problems, we propose a Selective Deep Ensemble (SDE) framework to combine various models and features in a complementary way, inspired by the attention mechanism. It is demonstrated that a large improvement can be acquired with slight increase on computation cost. Finally, we evaluate the performance on three public instance-retrieval datasets, VehicleID, VeRi and Market-1501, outperforming state-of-the-art methods by a large margin.

Keywords

Instance retrieval Vehicle and pedestrian Selective deep ensemble 

Notes

Acknowledgements

The authors of this paper are members of Shanghai Engineering Research Center of Intelligent Video Surveillance. Dr. Lei Song is also a visiting researcher with Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518060, China. Our research was sponsored by following projects: the National Natural Science Foundation of China (61402116、61403084); Program of Science and Technology Commission of Shanghai Municipality (No. 15530701300, No. 15XD1520200, No. 17511106803); 2012 IoT Program of Ministry of Industry and Information Technology of China; Key Project of the Ministry of Public Security (No. 2014JSYJA007); the Project of the Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University (ESSCKF 2015-03); Shanghai Rising-Star Program(17QB1401000); the Special Fund for Basic R&D Expenses of Central Level Public Welfare Scientific Research Institutions (C17384); National Key R&D program of China (2016YFC0801304, 2017YFC0803705), supported by CCF-Venustech Open Research Fund (Grant No. CCF-VenustechRP2017006), and supported by Guangxi Key Laboratory of Cryptography and Information Security (No.GCIS201719).

References

  1. 1.
    Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval[C]. Proceedings of the IEEE international conference on computer vision, pp 1269–1277Google Scholar
  2. 2.
    Bai Y, Gao F, Lou Y et al (2017) Incorporating intra-class variance to fine-grained visual recognition[J]. arXiv preprint arXiv:1703.00196Google Scholar
  3. 3.
    Gordo A, Almazán J, Revaud J et al (2016) Deep image retrieval: Learning global representations for image search[C]. European Conference on Computer Vision. Springer International Publishing, pp 241–257Google Scholar
  4. 4.
    Hariharan B, Arbeláez P, Girshick R et al (2015) Hypercolumns for object segmentation and fine-grained localization[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 447–456Google Scholar
  5. 5.
    He K, Zhang X, Ren S et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition[C]. European conference on computer vision. Springer, Cham, pp 346–361Google Scholar
  6. 6.
    He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  7. 7.
    Hoang T, Do TT, Tan DKL et al (2017) Selective deep convolutional features for image retrieval[J]. arXiv preprint arXiv:1707.00809Google Scholar
  8. 8.
    Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks[J]. arXiv preprint arXiv:1709.01507Google Scholar
  9. 9.
    Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features[C]. European conference on computer vision. Springer International Publishing, pp 685–701Google Scholar
  10. 10.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks[C]. Advances in neural information processing systems, pp 1097–1105Google Scholar
  11. 11.
    Lin TY, Dollár P, Girshick R et al (2016) Feature pyramid networks for object detection[J]. arXiv preprint arXiv:1612.03144Google Scholar
  12. 12.
    Liu X, Liu W, Mei T et al (2016) A deep learning-based approach to progressive vehicle re-identification for urban surveillance[C]. European conference on computer vision. Springer International Publishing, pp 869–884Google Scholar
  13. 13.
    Liu H, Tian Y, Yang Y et al (2016) Deep relative distance learning: tell the difference between similar vehicles[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2167–2175Google Scholar
  14. 14.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  15. 15.
    Ma C, Huang J B, Yang X et al (2015) Hierarchical convolutional features for visual tracking[C]. Proceedings of the IEEE international conference on computer vision, pp 3074–3082Google Scholar
  16. 16.
    Radenović F, Tolias G, Chum O (2016) CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples[C]. European conference on computer vision. Springer International Publishing, pp 3–20Google Scholar
  17. 17.
    Razavian A S, Azizpour H, Sullivan J et al (2014) CNN features off-the-shelf: an astounding baseline for recognition[C]. Computer vision and pattern recognition workshops (CVPRW), 2014 I.E. conference on. IEEE, pp 512–519Google Scholar
  18. 18.
    Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks[J]. ITE Transactions on Media Technology and Applications 4(3):251–258CrossRefGoogle Scholar
  19. 19.
    Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks[C]. Advances in neural information processing systems, pp 91–99Google Scholar
  20. 20.
    Sermanet P, Eigen D, Zhang X et al (2014) Overfeat: integrated recognition, localization and detection using convolutional networks[C]. In: ICLRGoogle Scholar
  21. 21.
    Shen Y, Xiao T, Li H et al (2017) Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals[J]. arXiv preprint arXiv:1708.03918Google Scholar
  22. 22.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition[C]. In: ICLRGoogle Scholar
  23. 23.
    Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9Google Scholar
  24. 24.
    Tolias G, Sicre R, Jégou H (2016) Particular object retrieval with integral max-pooling of CNN activations[C]. In: ICLRGoogle Scholar
  25. 25.
    Veit A, Wilber MJ, Belongie S (2016) Residual networks behave like ensembles of relatively shallow networks[C]. Advances in neural information processing systems, pp 550–558Google Scholar
  26. 26.
    Xu Q, Yan K, Tian Y (2017) Learning a repression network for precise vehicle search[J]. arXiv preprint arXiv:1708.02386Google Scholar
  27. 27.
    Yuan Y, Yang K, Zhang C (2017) Hard-aware deeply cascaded embedding[C]. Proceedings of the IEEE international conference on computer visionGoogle Scholar
  28. 28.
    Yue-Hei Ng J, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval[C]. Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61Google Scholar
  29. 29.
    Zhang Y, Liu D, Zha ZJ (2017) Improving triplet-wise training of convolutional neural network for vehicle re-identification[C]. Multimedia and expo (ICME), 2017 I.E. international conference on. IEEE, pp 1386–1391Google Scholar
  30. 30.
    Zheng L, Shen L, Tian L et al (2015) Scalable person re-identification: a benchmark[C]. Proceedings of the IEEE international conference on computer vision, pp 1116–1124Google Scholar
  31. 31.
    Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future[J]. arXiv preprint arXiv:1610.02984Google Scholar
  32. 32.
    Zheng Z, Zheng L, Yang Y (2016) A discriminatively learned cnn embedding for person re-identification[J]. arXiv preprint arXiv:1611.05666Google Scholar
  33. 33.
    Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro[J]. arXiv preprint arXiv:1701.07717Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Zhengyan Ding
    • 1
  • Lei Song
    • 1
    • 2
  • Xiaoteng Zhang
    • 1
  • Zheng Xu
    • 1
    • 2
  1. 1.The Third Research Institute of the Ministry of Public SecurityShanghaiChina
  2. 2.Shenzhen Key Laboratory of Media Security, Shenzhen University & Guangxi Key Laboratory of Cryptography and Information Security Shenzhen and GuilinChina

Personalised recommendations