Advertisement

Part-Aligned Bilinear Representations for Person Re-identification

  • Yumin Suh
  • Jingdong Wang
  • Siyu Tang
  • Tao Mei
  • Kyoung Mu Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11218)

Abstract

Comparing the appearance of corresponding body parts is essential for person re-identification. As body parts are frequently misaligned between the detected human boxes, an image representation that can handle this misalignment is required. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which generates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the image matching similarity is equivalent to an aggregation of the local appearance similarities of the corresponding body parts. Since the image similarity does not depend on the relative positions of parts, our approach significantly reduces the part misalignment problem. Training the network does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demonstrating its superiority over the state-of-the-art methods on the standard benchmark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.

Keywords

Person re-identification Part alignment Bilinear pooling 

Notes

Acknowledgement

This work was partially supported by Microsoft Research Asia and the Visual Turing Test project (IITP-2017-0-01780) from the Ministry of Science and ICT of Korea.

Supplementary material

474202_1_En_25_MOESM1_ESM.pdf (324 kb)
Supplementary material 1 (pdf 323 KB)

References

  1. 1.
    Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: CVPR (2015)Google Scholar
  2. 2.
    Bai, S., Bai, X., Tian, Q.: Scalable person re-identification on supervised smoothed manifold. In: CVPR (2017)Google Scholar
  3. 3.
    Bak, S., Corvée, E., Brémond, F., Thonnat, M.: Person re-identification using spatial covariance regions of human body parts. In: AVSS (2010)Google Scholar
  4. 4.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  5. 5.
    Chen, D., Yuan, Z., Chen, B., Zheng, N.: Similarity learning with spatial constraints for person re-identification. In: CVPR (2016)Google Scholar
  6. 6.
    Chen, D., Yuan, Z., Hua, G., Zheng, N., Wang, J.: Similarity learning on an explicit polynomial kernel feature map for person re-identification. In: CVPR (2015)Google Scholar
  7. 7.
    Chen, S.Z., Guo, C.C., Lai, J.H.: Deep ranking for person re-identification via joint representation learning. IEEE TIP 25(5), 2353–2367 (2016)MathSciNetGoogle Scholar
  8. 8.
    Chen, Y., Zhu, X., Gong, S.: Person re-identification by deep learning multi-scale representations. In: CVPR Workshop (2017)Google Scholar
  9. 9.
    Chen, Y.C., Zhu, X., Zheng, W.S., Lai, J.H.: Person re-identification by camera correlation aware feature augmentation. IEEE TPAMI 40(2), 392–408 (2017)CrossRefGoogle Scholar
  10. 10.
    Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: CVPR (2016)Google Scholar
  11. 11.
    Cheng, D.S., Cristani, M.: Person re-identification by articulated appearance matching. In: Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.) Person Re-Identification. ACVPR, pp. 139–160. Springer, London (2014).  https://doi.org/10.1007/978-1-4471-6296-4_7CrossRefGoogle Scholar
  12. 12.
    Cheng, D.S., Cristani, M., Stoppa, M., Bazzani, L., Murino, V.: Custom pictorial structures for re-identification. In: BMVC (2011)Google Scholar
  13. 13.
    Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M.: Person re-identification by symmetry-driven accumulation of local features. In: CVPR (2010)Google Scholar
  14. 14.
    Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR (2016)Google Scholar
  15. 15.
    Garcia, J., Martinel, N., Micheloni, C., Gardel, A.: Person re-identification ranking optimisation by discriminant context information analysis. In: ICCV (2015)Google Scholar
  16. 16.
    Geng, M., Wang, Y., Xiang, T., Tian, Y.: Deep transfer learning for person re-identification. arXiv:1611.05244 (2016)
  17. 17.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)Google Scholar
  18. 18.
    Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 262–275. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_21CrossRefGoogle Scholar
  19. 19.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv:1703.07737 (2017)
  20. 20.
    Jing, X.Y., et al.: Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. In: CVPR (2015)Google Scholar
  21. 21.
    Kim, J.H., On, K.W., Kim, J., Ha, J.W., Zhang, B.T.: Hadamard product for low-rank bilinear pooling. In: ICLR (2017)Google Scholar
  22. 22.
    Kodirov, E., Xiang, T., Fu, Z., Gong, S.: Person re-identification by unsupervised \(\ell _1\) graph learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 178–195. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_11CrossRefGoogle Scholar
  23. 23.
    Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: CVPR (2017)Google Scholar
  24. 24.
    Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 31–44. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37331-2_3CrossRefGoogle Scholar
  25. 25.
    Li, W., Zhao, R., Xiao, T., Wang, X.: DeepREiD: deep filter pairing neural network for person re-identification. In: CVPR (2014)Google Scholar
  26. 26.
    Li, X., Zheng, W.S., Wang, X., Xiang, T., Gong, S.: Multi-scale learning for low-resolution person re-identification. In: ICCV (2015)Google Scholar
  27. 27.
    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: CVPR (2015)Google Scholar
  28. 28.
    Liao, S., Li, S.Z.: Efficient PSD constrained asymmetric metric learning for person re-identification. In: ICCV (2015)Google Scholar
  29. 29.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  30. 30.
    Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV (2015)Google Scholar
  31. 31.
    Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Yang, Y.: Improving person re-identification by attribute and identity learning. arXiv:1703.07220 (2017)
  32. 32.
    Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE TPAMI 5(33), 978–994 (2011)CrossRefGoogle Scholar
  33. 33.
    Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S.: Video-based person re-identification with accumulative motion context. arXiv:1701.00193 (2017)
  34. 34.
    Liu, X., et al.: HydraPlus-Net: attentive deep features for pedestrian analysis. In: ICCV (2017)Google Scholar
  35. 35.
    Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: CVPR (2017)Google Scholar
  36. 36.
    Ma, B., Su, Y., Jurie, F.: Local descriptors encoded by fisher vectors for person re-identification. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 413–422. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33863-2_41CrossRefGoogle Scholar
  37. 37.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 2579–2605 (2008)zbMATHGoogle Scholar
  38. 38.
    Martinel, N., Das, A., Micheloni, C., Roy-Chowdhury, A.K.: Temporal model adaptation for person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 858–877. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_52CrossRefGoogle Scholar
  39. 39.
    Matsukawa, T., Okabe, T., Suzuki, E., Sato, Y.: Hierarchical Gaussian descriptor for person re-identification. In: CVPR (2016)Google Scholar
  40. 40.
    McLaughlin, N., Martinez del Rincon, J., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: CVPR (2016)Google Scholar
  41. 41.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  42. 42.
    Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Learning to rank in person re-identification with metric ensembles. In: CVPR (2015)Google Scholar
  43. 43.
    Peng, P., et al.: Unsupervised cross-dataset transfer learning for person re-identification. In: CVPR (2016)Google Scholar
  44. 44.
    Pham, N., Pagh, R.: Fast and scalable polynomial kernels via explicit feature maps. In: SIGKDD (2013)Google Scholar
  45. 45.
    Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_2CrossRefGoogle Scholar
  46. 46.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Schumann, A., Stiefelhagen, R.: Person re-identification by deep learning attribute-complementary information. In: CVPR Workshops (2017)Google Scholar
  48. 48.
    Shen, Y., Lin, W., Yan, J., Xu, M., Wu, J., Wang, J.: Person re-identification with correspondence structure learning. In: ICCV (2015)Google Scholar
  49. 49.
    Shi, Z., Hospedales, T.M., Xiang, T.: Transferring a semantic representation for person re-identification and search. In: CVPR (2015)Google Scholar
  50. 50.
    Su, C., Li, J., Zhang, S., Xing, J., Gao, W., Tian, Q.: Pose-driven deep convolutional model for person re-identification. In: ICCV (2017)Google Scholar
  51. 51.
    Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L.S., Gao, W.: Multi-task learning with low rank attribute embedding for person re-identification. In: ICCV (2015)Google Scholar
  52. 52.
    Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 475–491. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_30CrossRefGoogle Scholar
  53. 53.
    Sun, Y., Zheng, L., Deng, W., Wang, S.: SVDNet for pedestrian retrieval. In: ICCV (2017)Google Scholar
  54. 54.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  55. 55.
    Tang, S., Andriluka, M., Andres, B., Schiele, B.: Multi people tracking with lifted multicut and person re-identification. In: CVPR (2017)Google Scholar
  56. 56.
    Ustinova, E., Ganin, Y., Lempitsky, V.: Multiregion bilinear convolutional neural networks for person re-identification. In: AVSS (2017)Google Scholar
  57. 57.
    Varior, R.R., Haloi, M., Wang, G.: Gated siamese convolutional neural network architecture for human re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 791–808. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_48CrossRefGoogle Scholar
  58. 58.
    Varior, R.R., Shuai, B., Lu, J., Xu, D., Wang, G.: A Siamese long short-term memory architecture for human re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 135–153. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_9CrossRefGoogle Scholar
  59. 59.
    Wang, F., Zuo, W., Lin, L., Zhang, D., Zhang, L.: Joint learning of single-image and cross-image representations for person re-identification. In: CVPR (2016)Google Scholar
  60. 60.
    Wang, H., Gong, S., Zhu, X., Xiang, T.: Human-in-the-loop person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 405–422. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_25CrossRefGoogle Scholar
  61. 61.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  62. 62.
    Weinrich, C., Gross, M.V.H.M.: Appearance-based 3D upper-body pose estimation and person re-identification on mobile robots. In: ICSMC. IEEE (2013)Google Scholar
  63. 63.
    Wu, L., Shen, C., van den Hengel, A.: PersonNet: person re-identification with deep convolutional neural networks. arXiv:1601.07255 (2016)
  64. 64.
    Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR (2016)Google Scholar
  65. 65.
    Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: End-to-end deep learning for person search. arXiv:1604.01850 (2016)
  66. 66.
    Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: CVPR (2017)Google Scholar
  67. 67.
    Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., Zhou, P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: ICCV (2017)Google Scholar
  68. 68.
    Xu, Y., Lin, L., Zheng, W., Liu, X.: Human re-identification by matching compositional template with cluster sampling. In: ICCV (2013)Google Scholar
  69. 69.
    Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: ICLR (2014)Google Scholar
  70. 70.
    Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: CVPR (2016)Google Scholar
  71. 71.
    Zhang, Y., Li, X., Zhao, L., Zhang, Z.: Semantics-aware deep correspondence structure learning for robust person re-identification. In: IJCAI (2016)Google Scholar
  72. 72.
    Zhang, Y., Li, B., Lu, H., Irie, A., Ruan, X.: Sample-specific SVM learning for person re-identification. In: CVPR (2016)Google Scholar
  73. 73.
    Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Dual mutual learning. In: CVPR (2018)Google Scholar
  74. 74.
    Zhao, H., et al.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: CVPR (2017)Google Scholar
  75. 75.
    Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV (2017)Google Scholar
  76. 76.
    Zhao, R., Ouyang, W., Wang, X.: Learning mid-level filters for person re-identification. In: CVPR (2014)Google Scholar
  77. 77.
    Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868–884. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_52CrossRefGoogle Scholar
  78. 78.
    Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose invariant embedding for deep person re-identification. arXiv:1701.07732 (2017)
  79. 79.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)Google Scholar
  80. 80.
    Zheng, L., Wang, S., Tian, L., He, F., Liu, Z., Tian, Q.: Query-adaptive late fusion for image search and person re-identification. In: CVPR (2015)Google Scholar
  81. 81.
    Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv:1610.02984 (2016)
  82. 82.
    Zheng, W.S., Li, X., Xiang, T., Liao, S., Lai, J., Gong, S.: Partial person re-identification. In: ICCV (2015)Google Scholar
  83. 83.
    Zheng, Z., Zheng, L., Yang, Y.: A discriminatively learned CNN embedding for person re-identification. arXiv:1611.05666 (2016)
  84. 84.
    Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: ICCV (2017)Google Scholar
  85. 85.
    Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: CVPR (2017)Google Scholar
  86. 86.
    Zhong, Z., Zheng, L., Kang, G., Shaozi, L., Yi, Y.: Random erasing data augmentation. arXiv:1708.04896 (2017)
  87. 87.
    Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: CVPR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yumin Suh
    • 1
  • Jingdong Wang
    • 2
  • Siyu Tang
    • 3
    • 4
  • Tao Mei
    • 5
  • Kyoung Mu Lee
    • 1
  1. 1.ASRISeoul National UniversitySeoulKorea
  2. 2.Microsoft Research AsiaBeijingChina
  3. 3.Max Planck Institute for Intelligent SystemsTübingenGermany
  4. 4.University of TübingenTübingenGermany
  5. 5.JD AI ResearchBeijingChina

Personalised recommendations