Deep feature representation and multiple metric ensembles for person re-identification in security surveillance system

Article

Abstract

With the increasing concern about social public security and the development of large scale data storage technology, person re-identification in security surveillance system becomes a hot topic. Large variations in viewpoint and lighting across different camera views could change the appearance of the person a lot, which makes person re-identification still a challenging problem. Therefore, developing robust feature descriptors and designing discriminative distance metrics to measure the similarity between pedestrian images are two key aspects in person re-identification. In this paper, we propose a method using both deep learning and multiple metric ensembles to improve the performance of the re-identification. Firstly, we jointly use the various datasets to train a general Convolutional Neural Network (CNN) which is employed to extract the deep features of training and testing set afterwards. The deep architecture makes it possible to learn more abstract and internal features which are robust against the variations in viewpoint and lighting. Then we utilize the deep features of the training set to learn the specific distance metric of different datasets and combine it with Cosine distance metric together, multiple metric ensembles can measure the similarity between different images in a more comprehensive way. Finally, extensive experiments demonstrate that our method can improve the recognition performance effectively when compared with the state-of-the-art methods.

Keywords

Person re-identification Convolutional neural network Multiple metric ensembles Video surveillance 

References

  1. 1.
    Ahmed E, Jones M, Marks TK (2015) An improved deep learning architecture for person re-identification. In: Computer vision and pattern recognition. IEEE, pp 3908–3916Google Scholar
  2. 2.
    An L, Kafai M, Yang S, Bhanu B (2013) Reference-based person re-identification IEEE international conference on advanced video and signal based surveillance, pp 244–249Google Scholar
  3. 3.
    Baltieri D, Vezzani R, Cucchiara R (2011) 3dpes: 3d people dataset for surveillance and forensics. In: International ACM workshop on multimedia access to 3d human objects, pp 59–64Google Scholar
  4. 4.
    Bohn J, Ying Y, Gentric S, Pontil M (2014) Large margin local metric learning, computer vision C ECCV 2014. Springer International Publishing, Zurich, Switzerland, pp 679–694Google Scholar
  5. 5.
    Cheng DS, Cristani M, Stoppa M, Bazzani L, Murino V (2011) Custom pictorial structures for re-identification. British Mach Vision Conf 2:68.1–68.11Google Scholar
  6. 6.
    Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst Hum 43(4):996–1002CrossRefGoogle Scholar
  7. 7.
    Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information theoretic metric learning Proceedings of the 24th international conference on machine learning, ACM, vol 227, pp 209–216Google Scholar
  8. 8.
    Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, vol 23, pp 2360–2367Google Scholar
  9. 9.
    Gray D, Brennan S, Tao H (2007) Evaluating appearance models for recognition, reacquisition, and tracking Proceedings IEEE international workshop on performance evaluation for tracking and surveillanceGoogle Scholar
  10. 10.
    Gray D, Tao H (2008) Viewpoint invariant pedestrian recognition with an ensemble of localized features. Marseille, France: Springer Berlin Heidelberg, Computer Vision CECCV, pp. 262–275Google Scholar
  11. 11.
    Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. Scandinavian Conf Image Anal 6688:91–102CrossRefGoogle Scholar
  12. 12.
    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. Computer ScienceGoogle Scholar
  13. 13.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding Proceedings of the 22nd ACM international conference on multimedia, pp 675–678Google Scholar
  14. 14.
    Koestinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H (2012) Large scale metric learning from equivalence constraints. In: IEEE conference on computer vision & pattern recognition, pp 2288–2295Google Scholar
  15. 15.
    Kawanishi Y, Wu Y, Mukunoki M, Minoh M (2014) Shinpuhkan2014: A multi-camera pedestrian dataset for tracking people across multiple cameras. The Korea-Japan Joint Workshop on Frontiers of Computer Vision, pp 322–329Google Scholar
  16. 16.
    Li Z, Chang S, Liang F, Huang TS, Cao L, Smith JR (2013) Learning locally-adaptive decision functions for person verification IEEE conference on computer vision and pattern recognition IEEE computer society, vol 9, pp 3610–3617Google Scholar
  17. 17.
    Li W, Zhao R, Wang X (2012) Human re-identification with transferred metric learning. Daejeon, Korea: Springer Berlin Heidelberg, 2013, Computer Vision C ACCV 2012, pp 31–34Google Scholar
  18. 18.
    Li W, Zhao R, Xiao T, Wang X (2014) Deepreid: Deep filter pairing neural network for person re-identification 2014 IEEE conference on computer vision and pattern recognition (CVPR) IEEE computer society, pp 152–159CrossRefGoogle Scholar
  19. 19.
    Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206Google Scholar
  20. 20.
    Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model Thirtieth AAAI conference on artificial intelligenceGoogle Scholar
  21. 21.
    Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking International conference on pattern recognition. IEEE, pp 898–901Google Scholar
  22. 22.
    Liu Y, Liang Y, Liu S, Rosenblum DS, Zheng Y (2016) Predicting urban water quality with ubiquitous data, arXiv preprint 1610.09462
  23. 23.
    Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for viceo-based pedestrian re-identification IEEE international conference on computer vision, pp 3810–3818Google Scholar
  24. 24.
    Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2016) Action2Activity: recognizing complex activities from sensor data International conference on artificial intelligence, pp 1617–1623Google Scholar
  25. 25.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: Sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  26. 26.
    Liu X, Song M, Tao D, Liu Z, Zhang L, Bu J, Chen C (2013) Semi-supervised node splitting for random forest construction Proceedings of CVPR, vol 9, pp 492–449Google Scholar
  27. 27.
    Liu Y, Zhang X, Cui J, Wu C, Aghajan H, Zha H (2010) Visual analysis of child-adult interactive behaviors in video sequences International conference on virtual systems and multimedia. IEEE, pp 26–33Google Scholar
  28. 28.
    Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning Proceedings of the international joint conference on artificial intelligenceGoogle Scholar
  29. 29.
    Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. AAAI, pp 201–207Google Scholar
  30. 30.
    Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2016) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimedia Tools & Applications, pp 1–19Google Scholar
  31. 31.
    Mignon A, Jurie F (2012) Pcca: a new approach for distance learning from sparse pairwise constraints 2012 IEEE conference on computer vision and pattern recognition (CVPR), vol 157, pp 2666–2672Google Scholar
  32. 32.
    Paisitkriangkrai S, Shen C, Hengel Avd (2015) Learning to rank in person re-identification with metric ensembles Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1846–1855Google Scholar
  33. 33.
    Roth PM, Wohlhart P, Hirzer M, Kostingerand M, Bischof H (2012) Large scale metric learning from equivalence constraints IEEE conference on computer vision & pattern recognition, pp 2288–2295Google Scholar
  34. 34.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826Google Scholar
  35. 35.
    Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking European conference on computer vision, vol 8692, pp 688–703Google Scholar
  36. 36.
    Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244MATHGoogle Scholar
  37. 37.
    Wang W, Yan Y, Zhang L, Hong R, Sebe N (2016) Collaborative sparse coding for multi-view action recognition. IEEE Multimedia Magazine 23(4):80–87CrossRefGoogle Scholar
  38. 38.
    Wang W, Yan Y, Zhang L, Hong R, Sebe N (2016) Collaborative sparse coding for multiview action recognition. IEEE multiMedia 23(4):80–87CrossRefGoogle Scholar
  39. 39.
    Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1249–1258Google Scholar
  40. 40.
    Yi D, Lei Z, Li SZ (2014) Deep metric learning for practical person re-identification. ICPR, pp 34–39Google Scholar
  41. 41.
    Zhang L, Gao Y, Hong C, Feng Y, Zhu J, Cai D (2014) Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Transactions on Cybernetics 44(8):1408–1419CrossRefGoogle Scholar
  42. 42.
    Zhang L, Gao Y, Ji R, Dai Q, Li X (2014) Actively learning human gaze shifting paths for photo cropping. IEEE 23(5):2235–45MathSciNetGoogle Scholar
  43. 43.
    Zhang L, Gao Y, Ji R, Lu K, Shen J (2014) Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans Multimedia 16 (2):470–479CrossRefGoogle Scholar
  44. 44.
    Zhang L, Gao Y, Zimmermann R, Tian Q, Li X (2014) Fusion of multi-channel local and global structural cues for photo aesthetics evaluation. IEEE Trans Image Process A Pub IEEE Sign Process Soc 23(3):1419–29CrossRefGoogle Scholar
  45. 45.
    Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084MathSciNetCrossRefGoogle Scholar
  46. 46.
    Zhang L, Hong R, Gao Y, Ji R, Dai Q, Li X (2016) Image categorization by learning a propagated graphlet path. IEEE Trans Neural Netw Learn Syst 27(3):674–685MathSciNetCrossRefGoogle Scholar
  47. 47.
    Zhang L, Li X, Nie L, Yang Y, Xia Y (2016) Weakly supervised human fixations prediction. IEEE Trans Cybern 46(1):258–269CrossRefGoogle Scholar
  48. 48.
    Zhang L, Li X, Nie L, Yan Y, Zimmermann R (2016) Semantic photo retargeting under noisy image labels. ACM 12(3):37CrossRefGoogle Scholar
  49. 49.
    Zhang L, Song M, Li N, Bu J, Chen C (2009) Feature selection for fast speech emotion recognition International conference on multimedia 2009, Vancouver, British Columbia, Canada, pp 753–756Google Scholar
  50. 50.
    Zhang L, Song M, Liu Z, Liu X, Bu J, Chen C (2013) Probabilistic graphlet cut: exploring spatial structure cuefor weakly supervised image segmentation Proceedings of CVPR, vol 9, pp 1908–1915Google Scholar
  51. 51.
    Zhang L, Song M, Zhao Q, Liu X, Bu J, Chen C (2013) Probabilistic graphlet transfer for photo cropping. IEEE Trans Cybern 21(5):2887–2897Google Scholar
  52. 52.
    Zhang L, Wang M, Hong R, Yin B, Li X (2016) Large-scale aerial image categorization using a multitask topological codebook. IEEE Trans Cybern 46 (2):535–545CrossRefGoogle Scholar
  53. 53.
    Zhang L, Yang Y, Gao Y, Wang C, Yu Y, Li X (2014) A probabilistic associative model for segmenting weakly-supervised images. IEEE Trans Image Process 23(9):4150–4159MathSciNetCrossRefGoogle Scholar
  54. 54.
    Zhang L, Yang Y, Wang M, Hong R, Nie L, Li X (2016) Detecting densely distributed graph patterns for fine-grained image categorization. IEEE Trans Image Process 25(2):553–565MathSciNetCrossRefGoogle Scholar
  55. 55.
    Zhao R, Ouyang W, Wang X (2014) Learning mid-level filters for person re-identification. In: 2014 IEEE conference on computer vision and pattern recognition, pp 144–151Google Scholar
  56. 56.
    Zheng W-S, Gong S, Xiang T (2009) Associating groups of people. BMVCGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.School of Computer and InformationHefei University of TechnologyHefeiChina

Personalised recommendations