Efficient lightweight video person re-identification with online difference discrimination module

Abstract

Video person re-identification (video Re-ID) is a key technology applied to video surveillance and security. Typical person re-identification is designed to retrieve the correct match of the target image (query) from gallery images, while video Re-ID extends this to query from gallery videos. The main factors affecting the video Re-ID model are: (i) a high-quality frame-level feature extractor, and (ii) temporal modeling that combines frame-level features into a feature for retrieval. In this work, we use ShuffleNet V2-based lightweight algorithm for video Re-ID, which can meet the demand for practical application and solve the problem of high consumption for computing resources, and maintain high performance. At the same time, the lightweight space attention mechanism Spatial Group-wise Enhance (SGE) module is used to view the person in more detail, which makes the feature representation more compact and effectively improves the retrieval accuracy. Finally, we design an Online Difference Discrimination (ODD) module to measure the feature gap between video frames, and use this module to make different temporal modeling for different quality video sequences. Experiments on three datasets (i.e., iLIDS-VID, PRID2011 and MARS) show that our method is competitive with state-of-the-art methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Ahmed S, Dogra DP, Choi H, Chae S, Kim IJ et al (2019) Person re-identification in videos by analyzing spatio-temporal tubes. arXiv:1902.04856

  2. 2.

    Chen D, Hua G, Wen F, Sun J (2016) Supervised transformer network for efficient face detection. In: European conference on computer vision. Springer, pp 122–138

  3. 3.

    Chen Y, Liu L, Tao J, Xia R, Chen X (2020) The improved image inpainting algorithm via encoder and similarity constraint. Vis Comput, https://doi.org/10.1007/s00371-020-01932-3

  4. 4.

    Chen Y, Wang J, Xia R, Zhang Q, Cao Z, Yang K (2019) The visual object tracking algorithm research based on adaptive combination kernel. J Ambient Intell Humanized Comput 10(12):4855–4867

    Article  Google Scholar 

  5. 5.

    Dai J, Zhang P, Wang D, Lu H, Wang H (2018) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377

    MathSciNet  Article  Google Scholar 

  6. 6.

    Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  7. 7.

    Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). IEEE, vol 2, pp 1735–1742

  8. 8.

    Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737

  9. 9.

    Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on image analysis. Springer, pp 91–102

  10. 10.

    Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  11. 11.

    Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv:1602.07360

  12. 12.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  13. 13.

    Li S, Bak S, Carr P, Wang X (2018) Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 369–378

  14. 14.

    Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393

  15. 15.

    Liao X, He L, Yang Z, Zhang C (2018) Video-based person re-identification via 3d convolutional networks and non-local attention. In: Asian conference on computer vision. Springer, pp 620–634

  16. 16.

    Liao S, Hu Y, Zhu X, Li S (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206

  17. 17.

    Lisanti G, Masi I, Del Bimbo A (2014) Matching people across camera views using kernel canonical correlation analysis. In: Proceedings of the international conference on distributed smart cameras. ACM, pp 10

  18. 18.

    Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5790–5799

  19. 19.

    Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the european conference on computer vision (ECCV), pp 116–131

  20. 20.

    McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334

  21. 21.

    Navaneet K, Todi V, Babu RV, Chakraborty A (2019) All for one: Frame-wise rank loss for improving video-based person re-identification. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2472–2476

  22. 22.

    Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  23. 23.

    Si J, Zhang H, Li CG, Kuen J, Kong X, Kot AC, Wang G (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5363–5372

  24. 24.

    Song G, Leng B, Liu Y, Hetang C, Cai S (2018) Region-based quality estimation network for large-scale person re-identification. In: Thirty-second AAAI conference on artificial intelligence

  25. 25.

    Su X, Zou Y, Cheng Y, Xu S, Yu M, Zhou P (2018) Spatial-temporal synergic residual learning for video person re-identification. arXiv:1807.05799

  26. 26.

    Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: European conference on computer vision. Springer, pp 135–153

  27. 27.

    Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: European conference on computer vision. Springer, pp 688–703

  28. 28.

    Xiong F, Gou M, Camps O, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: European conference on computer vision. Springer, pp 1–16

  29. 29.

    Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 4733–4742

  30. 30.

    Zakria, Cai J, Deng J, Aftab MU, Kumar R (2019) Efficient and deep vehicle re-identification using multi-level feature extraction. Appl Sci 9 (7):1291

    Article  Google Scholar 

  31. 31.

    Zhang J, Wang N, Zhang L (2018) Multi-shot pedestrian re-identification via sequential decision making. In: Proceedings of the IEEE conferences on computer vision and pattern recognition, pp 6781–6789

  32. 32.

    Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1239–1248

  33. 33.

    Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  34. 34.

    Zhao Y, Shen X, Jin Z, Lu H, Hua X.s (2019) Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4913–4922

  35. 35.

    Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: A video benchmark for large-scale person re-identification. In: European conference on computer vision. Springer, pp 868–884

  36. 36.

    Zheng Z, Zheng L, Yang Y (2018) Pedestrian alignment network for large-scale person re-identification. IEEE Trans Circ Syst Video Technol 29(10):3037–3045

    Article  Google Scholar 

  37. 37.

    Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1318–1327

  38. 38.

    Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4747–4756

  39. 39.

    Zhou Q, Zhong B, Lan X, Sun G, Ji R (2020) Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans Image Process 29:1–1

    MathSciNet  Article  Google Scholar 

  40. 40.

    Zhou Q, Zhong B, Zhang Y, Li J, Fu Y (2018) Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Trans Multimed PP:1–1

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China (No. 61772530, No. 61806206, No. 61876121), in part by the State’s Key Project of Research and Development Plan of China (No.2016YFC0600908), in part by the Natural Science Foundation of Jiangsu Province of China (No. BK20171192, No. BK20180639), in part by the Six Talent Peaks Project in Jiangsu Province (No. 2018-XYDXX-044), in part by the Open Foundation of the Suzhou Smart City Research Institute, Suzhou University of Science and Technology (No. SZSCR2019005), and in part by the project supported by Xuzhou Science and Technology Plan Funds (No. KC19005).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rui Yao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gao, C., Yao, R., Zhou, Y. et al. Efficient lightweight video person re-identification with online difference discrimination module. Multimed Tools Appl (2021). https://doi.org/10.1007/s11042-021-10543-6

Download citation

Keywords

  • Video person re-identification
  • Spatial attention
  • Video temporal modeling