Advertisement

Person Search by Multi-Scale Matching

  • Xu LanEmail author
  • Xiatian Zhu
  • Shaogang Gong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

We consider the problem of person search in unconstrained scene images. Existing methods usually focus on improving the person detection accuracy to mitigate negative effects imposed by misalignment, mis-detections, and false alarms resulted from noisy people auto-detection. In contrast to previous studies, we show that sufficiently reliable person instance cropping is achievable by slightly improved state-of-the-art deep learning object detectors (e.g. Faster-RCNN), and the under-studied multi-scale matching problem in person search is a more severe barrier. In this work, we address this multi-scale person search challenge by proposing a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of learning more discriminative identity feature representations in a unified end-to-end model. This is realised by exploiting the in-network feature pyramid structure of a deep neural network enhanced by a novel cross pyramid-level semantic alignment loss function. This favourably eliminates the need for constructing a computationally expensive image pyramid and a complex multi-branch network architecture. Extensive experiments show the modelling advantages and performance superiority of CLSA over the state-of-the-art person search and multi-scale matching methods on two large person search benchmarking datasets: CUHK-SYSU and PRW.

Keywords

Person search Person detection and re-identification Multi-scale matching Feature pyramid Image pyramid Semantic alignment 

Notes

Acknowledgements

This work was partly supported by the China Scholarship Council, Vision Semantics Limited, the Royal Society Newton Advanced Fellowship Programme (NA150459), and Innovate UK Industrial Challenge Project on Developing and Commercialising Intelligent Video Analytics Solutions for Public Safety (98111-571149).

References

  1. 1.
    Adelson, E.H., Anderson, C.H., Bergen, J.R., Burt, P.J., Ogden, J.M.: Pyramid methods in image processing. RCA Eng. 29(6), 33–41 (1984)Google Scholar
  2. 2.
    Chang, X., Hospedales, T.M., Xiang, T.: Multi-level factorisation net for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 2 (2018)Google Scholar
  3. 3.
    Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)Google Scholar
  4. 4.
    Chen, X., Gupta, A.: An implementation of faster RCNN with study for region sampling. arXiv (2017)Google Scholar
  5. 5.
    Chen, Y., Zhu, X., Gong, S.: Person re-identification by deep learning multi-scale representations. In: Workshop of IEEE International Conference on Computer Vision, pp. 2590–2600 (2017)Google Scholar
  6. 6.
    Chen, Y.C., Zhu, X., Zheng, W.S., Lai, J.H.: Person re-identification by camera correlation aware feature augmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 392–408 (2018)CrossRefGoogle Scholar
  7. 7.
    Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1335–1344 (2016)Google Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  9. 9.
    Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)CrossRefGoogle Scholar
  10. 10.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  11. 11.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  12. 12.
    Gong, S., Cristani, M., Yan, S., Loy, C.C.: Person Re-Identification. Springer, London (2014).  https://doi.org/10.1007/978-1-4471-6296-4CrossRefzbMATHGoogle Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European Conference on Computer Vision, pp. 346–361 (2014)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  15. 15.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv (2017)Google Scholar
  16. 16.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv (2015)Google Scholar
  17. 17.
    Jiao, J., Zheng, W.S., Wu, A., Zhu, X., Gong, S.: Deep low-resolution person re-identification. In: AAAI Conference on Artificial Intelligence (2018)Google Scholar
  18. 18.
    Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2288–2295 (2012)Google Scholar
  19. 19.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)Google Scholar
  20. 20.
    Lan, X., Wang, H., Gong, S., Zhu, X.: Deep reinforcement learning attention selection for person re-identification. arXiv (2017)Google Scholar
  21. 21.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  22. 22.
    Li, M., Zhu, X., Gong, S.: Unsupervised person re-identification by deep learning tracklet association. In: European Conference on Computer Vision (2018)Google Scholar
  23. 23.
    Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: CVPR (2014)Google Scholar
  24. 24.
    Li, W., Zhu, X., Gong, S.: Person re-identification by deep joint learning of multi-loss classification. arXiv (2017)Google Scholar
  25. 25.
    Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  26. 26.
    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: IEEE International Conference on Computer Vision, pp. 2197–2206 (2015)Google Scholar
  27. 27.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  28. 28.
    Liu, H., et al.: Neural person search machines. In: IEEE International Conference on Computer Vision (2017)Google Scholar
  29. 29.
    Liu, J., et al.: Multi-scale triplet CNN for person re-identification. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 192–196. ACM (2016)Google Scholar
  30. 30.
    Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. arXiv (2015)Google Scholar
  31. 31.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2014)Google Scholar
  33. 33.
    Paszke, A., Gross, S., Chintala, S., Chanan, G.: Pytorch (2017)Google Scholar
  34. 34.
    Qian, X., Fu, Y., Jiang, Y.G., Xiang, T., Xue, X.: Multi-scale deep learning architectures for person re-identification. In: IEEE International Conference on Computer Vision (2017)Google Scholar
  35. 35.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  36. 36.
    Wang, H., Gong, S., Zhu, X., Xiang, T.: Human-in-the-loop person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 405–422. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_25CrossRefGoogle Scholar
  37. 37.
    Wang, H., Zhu, X., Gong, S., Xiang, T.: Person re-identification in identity regression space. Int. J. Comput. Vis. (2018)Google Scholar
  38. 38.
    Wang, J., Zhu, X., Gong, S., Li, W.: Transferable joint attribute-identity deep learning for unsupervised person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  39. 39.
    Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 688–703. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10593-2_45CrossRefGoogle Scholar
  40. 40.
    Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1249–1258 (2016)Google Scholar
  41. 41.
    Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3376–3385 (2017)Google Scholar
  42. 42.
    Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: IEEE International Conference on Computer Vision, pp. 82–90 (2015)Google Scholar
  43. 43.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_53CrossRefGoogle Scholar
  44. 44.
    Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  45. 45.
    Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3586–3593 (2013)Google Scholar
  46. 46.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)Google Scholar
  47. 47.
    Zheng, L., Zhang, H., Sun, S., Chandraker, M., Tian, Q.: Person re-identification in the wild. arXiv (2017)Google Scholar
  48. 48.
    Zhu, X., Wu, B., Huang, D., Zheng, W.S.: Fast openworld person re-identification. IEEE Trans. Image Process. (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Queen Mary University of LondonLondonUK
  2. 2.Vision Semantics Ltd.LondonUK

Personalised recommendations