Unsupervised domain adaption for image-to-video person re-identification

  • Xinyu Zhang
  • Sen Li
  • Xiao-Yuan JingEmail author
  • Fei Ma
  • Chen Zhu


Recently, person re-identification technique has been successfully applied to many fields, such as suspect tracking and lost human location. As video always contains more valuable information, more and more researchers focus on video based person re-identification, especially in image-to-video person re-identification (IVPR). However, most of existing IVPR models are under the supervised framework. In fact, marking enough training samples will cost numbers of labors, which limits the practical value of them. At the same time, the 2D features extracted from pedestrian image and 3D features extracted from pedestrian video are heterogeneous, which brings significant challenge for IVPR task. To effective solve the above problems, we propose an unsupervised domain adaption image-to-video person re-identification model by cross-modal feature generating and target information preserving transfer network (CMGTN). On one hand, the designed generator in our model can not only transform target domain unlabeled sample features into source domain feature space, but also can preserve target identity information. On the other hand, we eliminate the gap between pedestrian images and videos by embedding a cross-modal loss term. To evaluate the performance of our approach, we conduct extensive experiments on PRID-2011, iLIDS-VID and MARS datasets, and compare our approach with existing state-of-the-art IVPR models including four unsupervised methods and three supervised methods. Experimental results demonstrate the effectiveness of our approach.


Unsupervised domain adaption Image-to-video Person re-identification GAN Deep learning 



The authors would like to thank the editor, the associate editor, and anonymous reviewers for their constructive comments in helping improve our work. This work was supported by the NSFC-Key Project under Grant No. 61933013, the NSFC-Key Project of General Technology Fundamental Research United Fund under Grant No. U1736211, the Key Project of Natural Science Foundation of Hubei Province under Grant No. 2018CFA024, the Natural Science Foundation of Guangdong Province under Grant No. 2019A1515011076.


  1. 1.
    Bak S, Corvee E, Bremond F, Thonnat M (2010) Person re-identification using haar-based and DCD-based signature. In: IEEE conference on advanced video and signal-based surveillance, pp 1–8Google Scholar
  2. 2.
    Baltieri D, Vezzani R, Cucchiara R (2013) Learning articulated body models for people re-identification. In: ACM international conference on multimedia, pp 557–560Google Scholar
  3. 3.
    Chen Y, Zhu X, Gong S (2018) Deep association learning for unsupervised video person re-identification. British machine vision conference, p 48Google Scholar
  4. 4.
    Dai J, Zhang P, Wang D, Lu H, Wang H (2019) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377MathSciNetCrossRefGoogle Scholar
  5. 5.
    Fan H, Zheng L, Yang Y (2018) Unsupervised person re-identification: Clustering and fine-tuning. ACM Trans Multimed Computing Commun Appl 14(4):8:1-8:18CrossRefGoogle Scholar
  6. 6.
    Gong Y, Ke Q, Isard M, Lazebnik SA (2014) Multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233CrossRefGoogle Scholar
  7. 7.
    Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial networks. Adv Neural Inf Processing Sys 3:2672–2680Google Scholar
  8. 8.
    Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on image analysis, pp 91–102Google Scholar
  9. 9.
    Huang W, Liang C, Yu Y, Wang Z, Ruan W, Hu R (2018) Video-based person re-identification via self paced weighting. In: AAAI conference on artificial intelligence (AAAI), pp 2273–2280Google Scholar
  10. 10.
    Jing X, Zhang X, Zhu X, Wu F, You X, Gao Y, Shan S, Yang J (2019) Multiset feature learning for highly imbalanced data classification. IEEE transactions on pattern analysis and machine intelligence,
  11. 11.
    Kodirov E, Xiang T, Fu Z-Y, et al (2016) Person re-identification by unsupervised graph learning. In: European conference on computer vision, pp 178–195Google Scholar
  12. 12.
    Li X, Yin H, Zhou K, Zhou X (2019) Semi-supervised clustering with deep metric learning and graph embedding. World Wide Web,
  13. 13.
    Liao S, Zhu X, Li S (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE conference on computer vision and pattern recognition, pp 2197–2206Google Scholar
  14. 14.
    Liong V, Lu J, Tan Y-P, Zhou J (2017) Deep coupled metric learning for cross-modal matching. IEEE Trans Multimed 19(6):1234–1244CrossRefGoogle Scholar
  15. 15.
    Lisanti G, Martinel N, Micheloni C, Bimbo AD, Foresti GL (2019) From person to group re-identification via unsupervised transfer of sparse features. Image Vis Comput 83-84:29–38CrossRefGoogle Scholar
  16. 16.
    Lisanti G, Masi I, Bagdanov DA, Bimbo A (2015) Person re-identification by iterative re-weighted sparse ranking. IEEE Trans Pattern Anal Mach Intell 37(8):1629–1642CrossRefGoogle Scholar
  17. 17.
    Liu K, Ma B, Zhang W, Huang RA (2015) Spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: IEEE international conference on computer vision, pp 3810–3818Google Scholar
  18. 18.
    Liu Z, Wang D, Lu H (2017) Stepwise metric promotion for unsupervised video person re-identification. In: IEEE international conference on computer vision, pp 2448–2457Google Scholar
  19. 19.
    Lv J, Weihang Chen W, Qing Li Q, Can Yang C (2018) Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns, pp 7948–7956Google Scholar
  20. 20.
    Peng P, Xiang T, Wang Y, et al (2016) Unsupervised cross-dataset transfer learning for person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1306–1315Google Scholar
  21. 21.
    Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition, pp 815–823Google Scholar
  22. 22.
    Taigman Y, Polyak A, Wolf L (2016) Unsupervised cross-domain image generation, CoRRGoogle Scholar
  23. 23.
    Tian J, Teng Z, Li R, Li Y, Zhang B, Fan J (2019) Imitating targets from all sides: an unsupervised transfer learning method for person re-identification. arXiv:1904.05020
  24. 24.
    Wang C, Yang H, Meinel C (2016) A deep semantic framework for multimodal representation learning. Multimed Tools Appl 75(15):9255–9276CrossRefGoogle Scholar
  25. 25.
    Wang G, Lai J, Xie X (2018) P2SNet: Can an image match a video for person re-identification in an end-to-end way? IEEE Trans Circ Sys Vid Technol 28 (10):2777–2787CrossRefGoogle Scholar
  26. 26.
    Wang J, Zhu X, Gong S, Li W (2018) Transferable joint Attribute-Identity deep learning for unsupervised person re-identification. In: CVPR, pp 2275–2284Google Scholar
  27. 27.
    Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: ECCV, pp 688–703Google Scholar
  28. 28.
    Wu J, Liao S, Lei Z, Wang X, Yang Y, Li S (2019) Clustering and dynamic sampling based unsupervised domain adaptation for person re-identification. In: IEEE international conference on multimedia and expo, pp 886–891Google Scholar
  29. 29.
    Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE conference on computer vision and pattern recognition, pp 1249–1258Google Scholar
  30. 30.
    Yan F, Mikolajczyk K. (2015) Deep correlation for matching images and text. In: IEEE conference on computer vision and pattern recognition, pp 3441–3450Google Scholar
  31. 31.
    Ye M, Ma A, Zheng L, et al (2017) Dynamic label graph matching for unsupervised video re-identification. In: IEEE international conference on computer vision, pp 5152–5160Google Scholar
  32. 32.
    Yu B, Xu N (2019) Urgent image-to-video person reidentification by cross-media transfer cycle generative adversarial networks. J Electronic Imaging 28(1):013052CrossRefGoogle Scholar
  33. 33.
    Yu H, Wu A, Zheng W (2017) Cross-view asymmetric metric learning for unsupervised person re-identification, in IEEE international conference on computer vision, pp 994–1002Google Scholar
  34. 34.
    Zhang D, Wu W, Cheng H, et al (2018) Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans Circ Syst Vid Technol 28(10):2622–2632CrossRefGoogle Scholar
  35. 35.
    Zhang Y, Li S (2011) Gabor-LBP based region covariance descriptor for person re-identification. In: IEEE conference on image and graphics, pp 368–371Google Scholar
  36. 36.
    Zheng L, Bie Z, Sun Y, et al (2016) MARS: a video benchmark for large-scale person re-identification. In: European conference on computer vision, pp 868–884Google Scholar
  37. 37.
    Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable Person Re-identification: a benchmark. In: IEEE conference onon computer vision, pp 1116–1124Google Scholar
  38. 38.
    Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3346–3355Google Scholar
  39. 39.
    Zheng W, Gong S, Xiang T (2009) Associating groups of people. In: British machine vision conference, pp 2–6Google Scholar
  40. 40.
    Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: IEEE international conference on computer vision, pp 3774–3782Google Scholar
  41. 41.
    Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: IEEE conference computer vision and pattern recognition, pp 6776–6785Google Scholar
  42. 42.
    Zhu X, Jing X -Y, Wu F, et al (2016) Distance learning by treating negative samples differently and exploiting impostors with symmetric triplet constraint for person re-identification. In: IEEE international conference on multimedia and expo, pp 1–6Google Scholar
  43. 43.
    Zhu X, Jing X -Y, You X, et al (2018) Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. IEEE Trans Info Foren Sec 13(3):717–732CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.School of Computer ScienceWuhan UniversityWuhanChina
  2. 2.School of ComputerGuangdong University of Petrochemical TechnologyMaomingChina
  3. 3.College of AutomationNanjing University of Posts and TelecommunicationsNanjingChina

Personalised recommendations