1 Introduction

Person re-identification is the task of finding out the same identity in the field of non-overlapping cameras [1, 2, 15]. Due to illumination variations, human pose changes and different background of multiple cameras, its hard to match people from one camera to another. Although lots of approaches have been proposed from various standpoints, it is still an open problem.

There are two common ways to solve problems above for person re-identification, feature representation [3,4,5, 27, 28] and metric learning [4, 6, 7, 16, 17]. The feature representation mainly focused on how to describe the discriminative information of the object. A color invariant model was proposed in [3] which designed a variable of the change is smaller than the original change to robust illumination variations. Lisanti [4] exploited five color and texture low level features to complementarily represent individuals. In [5], a novel salient color name was presented to describe colors which projected the pixels to probability distribution over sixteen color names. Besides, it is important to learn an efficient metric to measure the distance between two vectors [6, 29, 30]. The metric learned from the algorithm [6] which can minimize the distance between features of pairs of true matches and maximizes the distance the same between pairs of wrong matches greatly improves the performance of the metric learning in person re-identification. The KISS (Keep It Simple and Straightforward) algorithm [6] was conducted the deeply research and was found that the commonness of two vectors could also describe the similarity of them [7].

Generally, the metric is learned from the feature vectors that exacted from images. However, due to the non-linear capability of these features, the metric learned is not discriminative. To solve this problem, the kernel-based approaches [4, 8, 9, 18] exploited kernel trick to map these features into kernel space to enhance the linear capability and then in kernel space, the metric learning is carried on. In [8], the feature vector was projected into kernel space and the metric was learned by KISS algorithm. To get an efficient kernel space, in [9], multiple kernels were used to map features into more than one subspace. These approaches prove the effectiveness of kernel trick mapping and achieve good performances. However, if the dimension of features mapped is high, the kernel representation gotten by kernel trick is not suitable and it may loss the discriminative information in the process of calculating the kernel matrix. Besides, different features may not be modeled by the same transformation function [10]. The jointly feature space may also be too complex to be robustly handled by a single metric. Therefore we propose to map these features respectively and the metric is learned in their corresponding kernel space. Finally, multiple metric measures can be weighted to get the final result.

The overview in Fig. 1 depicts the architecture of our proposed work. Firstly, the color, shape and texture features are extracted from images. And then they are mapped into kernel space respectively. In each subspace, the similarity measure is done using LSSL (Large Scale Similarity Learning) algorithm [7] to get distance based on features. Finally, the distances are weighted which the weights were already learned using modified Relief algorithm to get the result.

The main contributions in this paper are threefold: (1) multiple metric learning in kernel space is proposed to solve the problem of losing discriminative information in the process of producing kernel matrix. And the Relief algorithm [11] is modified here to get the weights of metrics. (2) Grey-world normalization is introduced to enhance discrimination of the color descriptors. (3) The experiments are deeply carried on to analyze aspects of our approach and the final results outperform the state-of-the-arts on two benchmarks.

Fig. 1.
figure 1

The overview of the proposed approach.

2 Our Approach

2.1 Person Representation

Color histograms [19] are very important features and are also common used in person re-identification. But, they are sensitive to changing illuminations and the color responses of cameras. The Grey-world normalization [12] is a common and effective technique to address this problem which assumes that the average color of a picture caught by the camera is grey. It divides each channel of every pixel by the average of this channel in RGB color space, i.e. \(R_1=R/mean(R_1)\). If the illumination and color responses change, the RGB values of pixels may change a lot, but \(R_1G_1B_1\) could change smaller than RGB. So the Grey-world normalization is robust to the changes. Here, we introduce this technique in RGB color space and then transformed into nRGB color space to enhance its robustness. The experiments following show its effectiveness.

Besides, the other three color spaces: HSV, RGS, YCbCr are also used to complementarily describe the color information. For each channel, 16-bin histogram is extracted from one of six non-overlapping strips in each image [6] i.e. \(16\times 6\times 3=288\) dimensions for three channels color space. For shape and texture features, HOG and LBP are used [4], but we only remove 8 pixels up and down of images in order to decrease background pixel influence.

After extracted features from images, these features are projected into kernel spaces respectively using Chi-Square exponential kernel to get their kernel representation [4].

2.2 Weighted Multiple Metrics in Kernel Space

In this section, multiple metric learning in kernel space based on features is proposed. The different distances gotten by LSSL algorithm for features are weighted to be the final distance:

$$\begin{aligned} \ d_f = \sum _{n=1}^{p} w_nd_n \; \end{aligned}$$
(1)

where p is the number of features and \(d_n\) is the corresponding distance with features. \(w_n\) denotes the weight and \( \sum \limits _{n=1}^{p} w_n = 1 \).

To make sure the weights of distances, the importance of features should be de-pendent on. The more important the feature is, the larger weight should be given. In order to get \(w_n\) , the classic algorithm Relief [11] can be used to calculate the importance of each feature. But in the re-id problem, the Relief algorithm needs to be modified because it usually can deal with two classification problem. To solve this problem we can put the images belonging to the same person into one category and the other images into another. Besides, the sampling number in the Relief is not needed here, because the number of images of the same individual is few, e.g. two. We put all the training samples into the process of calculating the weights and the weights are cumulative sum. Finally, the values of the importance of each feature are normalized to get the weights. The specific algorithm is showed in following.

figure a

2.3 Metric Learning Algorithm

In our approach, LSSL [7] is used to learn the metrics in kernel spaces. Given two samples x and y , the similarity distance is defined by LSSL as:

$$\begin{aligned} \ d=(x+y)^TM(x+y)-\lambda (x-y)^TW(x-y)\; \end{aligned}$$
(2)

\(\lambda \) is a constant. M and W are two metrics learned as following.

$$\begin{aligned} \ M=\sum \nolimits _{mD}^{-1}-\sum \nolimits _{mS}^{-1}\; \end{aligned}$$
(3)
$$\begin{aligned} \ W=\sum \nolimits _{eS}^{-1}-\sum \nolimits _{eD}^{-1}\; \end{aligned}$$
(4)

where

$$\begin{aligned} \ \sum \nolimits _{mS}=1/N\sum \limits _{i=1}^N(x_i+y_i)(x_i+y_i)^T \; \end{aligned}$$
(5)
$$\begin{aligned} \ \sum \nolimits _{eS}=1/N\sum \limits _{i=1}^N(x_i-y_i)(x_i-y_i)^T \; \end{aligned}$$
(6)
$$\begin{aligned} \ \sum \nolimits _{mD}=\sum \nolimits _{eD}=1/{(2N)}\sum \limits _{i=1}^N[(x_i+y_i)(x_i+y_i)^T+(x_i-y_i)(x_i-y_i)^T] \; \end{aligned}$$
(7)

In this paper, Algorithm 1 is used to get the weights of features. But in the experiments, it is noticed that there were different maximum distances with different features. So the multiple distances are not simply weighted to get the final distance. The maximum distances based on features should be normalized to the same and then were weighted.

3 Experiments and Discussions

Our proposed approach is tested on two public datasets: VIPeR [13] and CUHK01 [14]. Images have been normalized to pixels, is set to 1.5 on two datasets. The result of the average of cumulative match characteristic (CMC) curve for 10 trials of each experiment is reported in this work.

3.1 Datasets

VIPeR datasetFootnote 1 is now considered the most challenging dataset which is also widely used in the field of person re-identification. It contains 1264 images from 632 pedestrians in two disjoint cameras with different illumination conditions and viewpoints. 632 pedestrians are randomly split into two groups, 316 pedestrians for training, the others for testing.

CUHK01 datasetFootnote 2 contains 971 individuals from two non-overlapping cameras. Each person has two images with different angles per camera. The dataset is also randomly split into two parts, 485 persons for training, the others for testing.

3.2 Comparison with State-of-the-Art Methods

In this section, the performance of our method compared with other some state-of-the-art algorithms on two datasets is shown. In Fig. 2, Tables 1 and 2 as we can see, the performance of our proposed algorithm on VIPeR and CUHK01 outperforms other methods. Accurately, the rank-1 rate is increased from 40.7% to 42.69% after we use the multiple metric learning in kernel space. In Table 2, our method greatly improves the accuracy on CUHK01 dataset. The reasonable explanation is that the form of projecting features expresses more information than the traditional kernel mapping. Besides, the Grey-world normalization and the weights learned from nor-malized modified Relief also prove their effectiveness.

3.3 Evaluation and Analysis

Effect of Grey-world normalization. To illustrate the effectiveness of Grey-world normalization fairly, we also use the KISS metric learning which was used in [6]. The four color feature vectors are concatenated to be a single vector. As shown in Fig. 3, the rank-1 recognition rate of color features extracted from VIPeR without normalization is 29.56%. After Grey-world normalization on images, the rate ups to 34.37%. In Fig. 4, the rank-1 rate of color features increases from 36.54% to 39.13%. The experiments show that the results with normalization are better than that without. This indicates that the normalization is robust to illuminations changing and color responses of cameras and it can effectively improve the accuracy of color features.

Fig. 2.
figure 2

CMC curves of some state-of-the-art methods on VIPeR dataset. The rank-1 identification rates are marked before the name of each curve.

Effect of multiple metric learning. Different from that multiple vectors are concatenated to be a single vector, the features were projected respectively and the corresponding metric is learned for each feature. For example, in Fig. 3, the four color feature vectors are concatenated to be a 1152 dimension vector and its rank-1 matching rate is 34.37%. Now, we learn four metrics for four color feature and their distances are weighted to be the final distance. The rate ups to 38.07%, improving 3.7% comparing with 34.37%. The HOG and LBP features without multiple metric achieve 5.16%, while the rate improves 2.28% to 7.44% using two metrics learning. Similarly, in Fig. 4 the experiments on CUHK01 dataset, the result of multiple metrics in color, shape and texture features are better than that without. This is because the features are mapped respectively instead of projecting a high dimensional vector. The multiple metrics can express more discri-minative information to avoid loss information in the process of mapping.

Table 1. The identification rates (%) on VIPeR at rank 1, 10, 20, 50, 100 are listed.
Table 2. The identification rates (%) on CUHK01 at rank 1, 10, 20, 50, 100 are listed.

Effect of Relief algorithm. In this small section, the four common algorithms used to make sure the weights are introduced to compare their performances. Adopt Weight refers to adopt weight learning and we get the rank-1 39.78%. It is because adopt weight algorithm only consider the rank-1 to get the importance of the feature and it do not consider the distinguished ability of features. Then the Most Probability is carried on. The result is much worse than our method. In Fig. 5, the performance of normalized modified relief is better than the modified relief. This is because we consider there are different maximum distances with different features and we normalize them to solve it. So the weights by normalized modified relief can get better complementarities.

Fig. 3.
figure 3

The Effectiveness of Grey-world normalization and multiple metric learning in color, HOG and LBP features on VIPeR.

Fig. 4.
figure 4

The effectiveness of Grey-world normalization and multiple metric learning in color, HOG and LBP features on CUHK01.

Fig. 5.
figure 5

The effectiveness of Relief algorithm.

4 Conclusion

In this paper, we have proposed multiple metric learning in kernel space to address the problem of losing discriminative information in the process of calculating the kernel matrix. The experiments demonstrate the results of projecting features respectively can get better performance than that of mapping one high dimensional vector and the experiments also prove that our approach is effective and achieve better performance than some state-of-the-art methods. In the future, feature representation especially the color invariant feature will be deeply investigated.