Mask-guided dual attention-aware network for visible-infrared person re-identification


Given a person of interest in RGB images, Visible-Infrared Person Re-identification (VI-REID) aims at searching for this person in infrared images. It faces a number of challenges due to large cross-modality discrepancies and intra-modality variations caused by illuminations, human poses, viewpoints and cluttered backgrounds, etc. This paper proposes a Mask-guided Dual Attention-aware Network (MDAN) for VI-REID. MDAN consists of two individual networks for two different modalities respectively, whose feature representations are driven by mask-guided attention-aware information and multi-loss constraints. Specifically, we first utilize masked image as a supplement to the original image, so as to enhance the contour and appearance information which are extremely important clues for matching the features of pedestrians from visible and infrared modalities. Second, a Residual Attention Module (RAM) is put forward to capture fine-grained features and subtle differences among pedestrians, so as to learn more discriminative features of pedestrians from heterogeneous modalities by adaptively calibrating feature responses along channel and spatial dimensions. Third, features from two individual streams of two modalities will be directly aggregated to form a cross-modality identity representation. Extensive experiments demonstrate that the proposed approach effectively improves the performance of VI-REID task and remarkably outperforms the state-of-the-art methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. 1.

    Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: PICLR

  2. 2.

    Barra P, Bisogni C, Nappi M, Freire-Obregón D, Castrillón-Santana M (2020) Gotcha-i: a multiview human videos dataset. security in computing and communications

  3. 3.

    Bedagkar-Gala A, Shah S (2014) A survey of approaches and trends in person re-identification. In: Image Vision Comput, pp 270–286

  4. 4.

    Chen T, Ding S, Xie J, Yuan Y, Chen W, Yang Y, Wang Z (2019) ABD-Net:, Attentive but Diverse Person Re-Identification. arXiv:1908.01114

  5. 5.

    Chen D, Zhang S, Ouyang W, Yang J, Tai Y (2018) Person search via a mask-guided two-stream cnn model. arXiv:1807.08107

  6. 6.

    Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chu T (2017) SCA-CNN : Spatial And channel-wise attention in convolutional networks for image captioning. In: CVPR

  7. 7.

    Cheng D, Li X, Qi M, Liu X, Chen C, Niu D (2019) Exploring cross-modality commonalities via dual-stream multi-branch network for infrared-visible person re-identification. In: IEEE Access, pp 12824–12834

  8. 8.

    Choi S, Lee S, Kim Y, Kim T, Kim C (2020) Hi-cmd: hierarchical cross-modality disentanglement for visible-infrared person re-identification. In: CVPR

  9. 9.

    Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Crossmodality person re-identification with generative adversarial training. In: IJCAI, pp 677–683

  10. 10.

    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR

  11. 11.

    De Marsico M, Distasi R, Ricciardi S, Riccio D (2014) A comparison of approaches for person re-identification. In: ICPRAM, pp 189–198

  12. 12.

    Feng Z, Lai J, Xie X (2019) Learning modality-specific representations for visible-infrared person re-identification, IEEE Transactions on Image Processing, 29, 579–590

  13. 13.

    Fu Y, Wei Y, Zhou Y, Shi H, Huang G, Wang X, Yao Z, Huang T (2018) Horizontal pyramid matching for person reidentification. arXiv:1804.05275

  14. 14.

    Guler RA, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkino I (2016) Densereg: Fully convolutional dense shape regression in-the-wild. arXiv:1612.01202

  15. 15.

    Hao Y, Li J, Wang N, Gao X (2020) Modality adversarial neural network for visible-thermal person re-identification, p Pattern Recognition

  16. 16.

    Hao Y, Wang N, Li J, Gao X (2019) Hsme: Hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp 8385–8392

  17. 17.

    He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. arXiv:1703.06870

  18. 18.

    Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737

  19. 19.

    Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. arXiv:1709.01507

  20. 20.

    Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: NIPS

  21. 21.

    Jiang J, Jin K, Qi M, Wang Q, Wu J, Chen C (2020) A cross-modal multi-granularity attention network for rgb-ir person re-identification. In: Neurocomputing

  22. 22.

    Kalayeh MM, Basaran E, Gokmen M, Kamasak ME, Shah M (2018) Human semantic parsing for person re-identification. In: CVPR, pp 1062–1071

  23. 23.

    Kang JK, Hoang TM, Park KR (2019) Person re-identification between visible and thermal camera images based on deep residual CNN using single input. [J]. IEEE Access, 7: pp 57972–57984.

  24. 24.

    Kumar V, Namboodiri A, Paluri M, Jawahar C (2017) Pose-aware person recognition. In: CVPR

  25. 25.

    Lan X, Wang H, Gong S, Zhu X (2017) Deep reinforcement learning attention selection for person re-identification. In: BMVC

  26. 26.

    Li S, Bak S, Car P, Wang X (2018) Diversity regularized spatiotemporal attention for video-based person re-identificatio. In: CVPR

  27. 27.

    Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: CVPR

  28. 28.

    Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: CVPR

  29. 29.

    Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: CVPR

  30. 30.

    Liang X, Gong K, Shen X, Lin L (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. arXiv:1804.01984

  31. 31.

    Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: CVPR, pp 2197–2206

  32. 32.

    Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCo: common objects in context. In: ECCV

  33. 33.

    Lin D, Tang X (2006) Inter-modality face recognition. In: ECCV

  34. 34.

    Lin L, Wang G, Zuo W, Feng X, Zhang L (2017) Cross-domain visual matching via generalized similarity measure and feature learning. In: TPAMI, pp 1089–1102

  35. 35.

    Liu X, Zhao H, Tian M, Sheng L, Shao J, Yi S, Yan J, Wang X (2017) Hydraplus-net: Attentive deep features for pedestrian analysis. In: ICCV

  36. 36.

    Nguyen DT, Hong HG, Kim KW, Park KR (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors

  37. 37.

    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. In: IJCV

  38. 38.

    Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: CVPR, pp 1227–1236

  39. 39.

    Song C, Huang Y, Ouyang W, Wang L (2018) Mask-guided contrastive attention model for person re-identification. In: CVPR

  40. 40.

    Su C, Li J, Zhang S, Xing J, Gao W, Tian Q (2017) Pose-driven deep convolutional model for person re-identification. In: ICCV

  41. 41.

    Sun Y, Xu Q, Li Y, Zhang C, Li Y, Wang S, Sun J (2019) Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. In: CVPR

  42. 42.

    Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and A strong convolutional baseline). In: ECCV, pp 501–518

  43. 43.

    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR

  44. 44.

    Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV

  45. 45.

    Vezzani R, Baltieri D, Cucchiara R (2013) People Reidentification in surveillance and forensics: a survey. In: ACM computing surveys

  46. 46.

    Wang X, Girshick RB, Gupta A, He K (2018) Non-local neural networks. In: CVPR

  47. 47.

    Wang Y, Wang L, You Y, Zou X, Chen V, Li S, Huang G, Hariharan B, et al., Weinberger KQ (2018) Resource aware person re-identification across multiple resolutions. In: CVPR, pp 8042–8051

  48. 48.

    Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person reidentification. arXiv:1804.01438

  49. 49.

    Wang Z, Zheng Y, Chuang Y-Y, Satoh S (2019) Learning to reduce dual-level discrepancy for infraredvisible person re-identification. In: CVPR

  50. 50.

    Wu J, Liu H, Jiang J, Qi M, Ren B, Li X, Wang Y (2020) Person attribute recognition by sequence contextual relation learning. In: IEEE

  51. 51.

    Wu A, Zheng W-S, Yu H-X, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. In: ICCV, pp 5380–5389

  52. 52.

    Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: ICML

  53. 53.

    Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person reidentification. In: IEEE, pp 4733–4742

  54. 54.

    Yang F, Yan K, Lu S, Jia H, Xie X, Gao W (2019) Attention driven person re-identification. In: Pattern Recognit, pp 143–155

  55. 55.

    Ye M, Lan X, Li J, Yuen PC (2018) Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI

  56. 56.

    Ye M, Lan X, Wang Z, Yuen PC (2019) Bi-directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification. In: IEEE TIFS

  57. 57.

    Ye M, Wang Z, Lan X, Yuen PC (2018) Visible thermal person re-identification via dual-constrained topranking. In: IJCAI

  58. 58.

    Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928

  59. 59.

    Zhang Y, Guo J, Huang Z, Qiu W, Fan H (2019) Multi-layer attention for person re-identification. In: MATEC web of conferences, Vol. 277

  60. 60.

    Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: Surpassing human-level performance in person re-identification. arXiv:1711.08184

  61. 61.

    Zhao L, Li X, Zhuang Y, JingdongWang (2017) Deeply-learned part-aligned representations for person re-identification. In: ICCV

  62. 62.

    Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle Net: Person re-identification with human body region guided feature decomposition and fusion. In: CVPR

  63. 63.

    Zheng L, Huang Y, Lu H, Yang Y (2017) Pose invariant embedding for deep person re-identification. arXiv:1701.07732

  64. 64.

    Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive siamese networks. In: CVPR

  65. 65.

    Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: Past, present and future. arXiv:1610.02984

  66. 66.

    Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: CVPR, pp 2921–2929

Download references

Author information



Corresponding author

Correspondence to Suzhi Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by theNational Natural Science Foundation of China Grant 61876056 and Grant 61771180

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Qi, M., Wang, S., Huang, G. et al. Mask-guided dual attention-aware network for visible-infrared person re-identification. Multimed Tools Appl (2021).

Download citation


  • Visible-infrared person re-identification
  • Residual attention module
  • Mask-guided recognition