Keywords

1 Introduction

In real world applications such as video-surveillance, captured faces are often of low-resolution (LR). At these environments the obtained LR face image loses important details which are discriminative between persons mainly due to the distance among subjects and camera. These images can also present facial variations such as pose and expression; thus, it represents a challenge for a recognition task. Low-resolution face recognition (LRFR) methods try to cope with a classification problem between LR test images and high-resolution (HR) gallery images, causing the dimensional mismatch problem. Therefore, the dimensional mismatch between gallery/probe pairs and the lack of facial features are some of the main challenges related to LRFR [1]. Some authors have tried to find resolution-robust feature representation [2], but this is a difficult task because most of the effective features used in HR face recognition such as texture and color, may fail with LR images. The performance of traditional methods in the LR case suggests that current feature representation approaches are not suitable to cope with LRFR [3]. To improve the results, it becomes a priority to explore alternatives to the feature-based representation.

A representation based on dissimilarities between objects [4] is advantageous in situations where it is easier to define dissimilarities rather than features. The dissimilarity space (DS) representation has been successfully used in many difficult task such as person re-identification [5]. Based on the success of previous works [4], we proposed the use of dissimilarity representations, as an alternative for LRFR. We believe that more discriminative information for classification can be obtained if the LR images are analyzed in the context of dissimilarities with other images. However, previous works assumed standard or designed dissimilarities, and not dissimilarities automatically learned for a given problem. Some researchers have shown that the classification could be greatly improved by learning a suitable distance metric. Then, we consider improving the original DS representation by using metric learning methods on top of it. Metric learning can provide a way to adapt a distance function to the given task.

In this work, the standard Euclidean distance in the DS is replaced by a learned metric, i.e., a Mahalanobis metric. We compared our proposal with some state-of-the-art representative methods based on feature vector representations. To address the dimensional mismatch, we used the best performing strategy proposed in [6], where the HR images are down-scaled and then up-scaled and the LR images are up-scaled to the same resolution. The proposal was evaluated on different face database including the SCFace [7], which is a very difficult database because it emphasizes the challenges of face recognition in surveillance environments.

The paper is organized as follows. Section 2 presents the related work on LRFR, the DS representation and the metric learning approach. Section 3 presents our proposal based on DS with automatically learned metrics to cope with the classification problem of LR facial images. Experiments and discussion are presented in Sect. 4, and concluding remarks are provided in Sect. 5.

2 Related Work

With the growing demands on surveillance applications, extensive efforts have been made on LRFR research. However, it remains an open issue due to the challenges posed by LR. Furthermore, the different resolutions between gallery and probe images lead to the so-called dimensional mismatch. To cope with this problem, different approaches have been used such as unified feature space (CLPM) [8]. This approach is used to project HR and LR images into a common space, which seems feasible to cope with the dimensional mismatch. However, it is not straightforward to find an optimal inter-resolution space and the transformations process may introduce noise. Several methods have used super-resolution (SR) techniques. However, these kind of methods mostly focus on obtaining a good visual reconstruction rather than a higher recognition rate. Current approaches mainly include feature vector representation for addressing LRFR. Resolution-robust feature representation has been considered for the LR case. Multidimensional scaling (MDS) [9] is a representative method, in which the relationships between LR and HR are explored taking into account the dimensional mismatch problem. Many authors have been working on this idea trying to find a common or inter-resolution space to project LR and their corresponding HR images on it [10].

A dissimilarity representation between objects is an alternative solution. Based on the idea proposed in [4], the dissimilarities are considered as the connection between perception and higher-level knowledge, which are key elements in the process of human recognition and categorization. By using the differences with prototypes for creating the representations we may be able to emphasize relevant information for discrimination among the classes, which, otherwise, by only analyzing the image, may be difficult to express in a feature representation. Following up on [6], they proposed the DS for LRFR but they only used the standard Euclidean distance. We believe that the use of a suitable distance metric can improve the classification accuracy. For example, in [11] they showed that it is possible to improve the K-NN classification accuracy using suitable distance metrics. The goal of metric learning algorithms is to take advantage of prior information in form of labels over standard similarity measures.

Compared with previous approaches this work is different in some aspects. We proposed the use of a dissimilarity based representation using learned metrics to achieve more discriminative distances in LRFR. In particular, our proposal is an alternative representation to feature space (FS) based on dissimilarities between objects and also introducing metric learning to replace the standard Euclidean distance in the DS.

3 Proposed Approach: Dissimilarity Space and Metric Learning for LRFR

A general scheme of the proposed strategy can be found in Fig. 1. In the following we will describe in more details the dissimilarity space construction and the metric learning approaches.

Fig. 1.
figure 1

General scheme of our proposal

3.1 Dissimilarity Space

Duin and Pekalska [4] proposed the DS as an Euclidean vector space in which it is possible to use several statistical classifiers. Although it has been used to solve a number of problems [12, 13] their advantages to solve the dimensional mismatch in LR case, has not been explored yet. The proximity information is intuitively more discriminative than the features or the composition of each object independently. Based on its advantages, we consider the use of the dissimilarity space to achieve a more discriminative relational representation of the LR images. Let X be the space of objects, let \(R =\{r_{1},r_{2},...,r_{k}\}\) be the set of prototypes such that \(R\in X\), and let \(d:X\times X\rightarrow {\mathbb {R}^{+}}\) be a suitable dissimilarity measure for the problem. For a training set \(T =\{x_1,x_2,...,x_l\}\) such that \(T\in X\), a mapping \(\phi ^{d}_{R}:X \rightarrow {\mathbb {R}}^{k}\) defines the embedding of training and test objects in the DS by the dissimilarities with the prototypes:

$$\begin{aligned} \phi ^{d}_{R}(x_i) = [d(x_i,r_{1}) d(x_i,r_{2})...d(x_i,r_{k})]. \end{aligned}$$
(1)

3.2 Metric Learning Approach

In this section, we introduce the general idea of metric learning for kNN classification and review some previously studied approaches: LMNN, which directly attempts to optimize k-NN classification error; another method based on the Linear Discriminant Analysis (LDA) [14]; and the KISS metric learning method.

Large Margin Nearest Neighbor (LMNN): A mapping \(D:X\times X\rightarrow {\mathbb {R}}^{+}_{0}\) over a vector space X is defined as a metric if for all the vectors \(\overrightarrow{x_{i}}, \overrightarrow{x_{j}}, \overrightarrow{x_{k}} \in X\) it satisfies some properties such as symmetry and triangular inequality [15]. It is possible to obtain a family of metrics on X by computing Euclidean distances after performing a linear transformation \(x = Lx\). These metrics compute quadratic distances that can be expressed in terms of the square matrix \(M = L'L\). Thus, any matrix M formed in this way from a real-valued matrix L is guaranteed to be positive semidefinite, refers to the Mahalanobis metric. In LMNN, the distances are viewed as generalizations of Euclidean distances, i.e., Euclidean distances are recovered by setting M to be equal to the identity matrix. The idea is based on the observation that the kNN classification could have a good performance for a sample of the data if its k-nearest neighbors share the same label. By increasing the number of training samples with this property they learned a linear transformation of the input space that precedes kNN classification using Euclidean distances. Their approach has the advantage of improving the original Euclidean distance from a classification perspective and in some cases to provide a lower-dimensional embedding of the data.

Linear Discriminant Analysis: Different ways have been proposed to estimate Mahalanobis distance metrics to compute distances in k-NN classification. One of such methods is Eigen decomposition. This approach has been used to discover informative linear transformations of the input space, which can be seen as inducing a Mahalanobis distance metric in the original space. LDA is a representative Eigenvector method. It operates in a supervised setting and uses the class labels of the inputs to derive informative linear projections. In the context of metric learning, LDA computes a linear projection L that maximizes the amount of between-class variance relative to the amount of within-class variance. The linear transformation L is chosen to maximize the ratio of between-class to within-class variance, subject to the constraint that L defines a projection matrix. The traditional LDA algorithm is still attractive compared to several recently developed metric learning algorithms [16].

Keep It Simple and Straightforward Metric Learning (KISSME): Another strategy is to learn an optimal distance measure for genuine and impostor pairs. Koestinger et al. [17] proposed an effective method to learn the distance metric based on a likelihood-ratio test. The equivalence constraints are considered natural inputs to metric learning methods because similarity functions mainly establish a relation between pairs of points. KISSME [17] computes the covariance matrix of similar and dissimilar pairs, and uses the difference of the inverse covariance matrix as a projection matrix. It does not rely on complex iterative optimization, which is an advantage for practical applications. It applies the log likelihood ratio test of two Gaussian distributions for metric learning, and so a simplified closed-form solution can be derived.

4 Experimental Evaluation

We present the results of the proposed scheme for low-resolution face recognition. Two different database were considered for the experiments: the SCFace database [7] and the Labeled Faces in the Wild (LFW) [18]. On all of our experiments, the test images were obtained by down-scaling the original images using a bicubic interpolation at different sizes. A bicubic interpolation was also applied in the up-scaling process to obtain high resolution images. The standard Euclidean distance in the DS was replaced by a learned metric, and the linear discriminant classifier (LDC), which assumes equal covariance matrices for the classes, was used. We computed local binary patterns (LBP) on local blocks of the geometrically normalized images. Histograms were computed on each block and concatenated. The dissimilarity measure was computed on top of a feature representation. Particularly, we created the dissimilarity space using chi-square distance between LBP histograms, since it is a more discriminative measure for histograms.

4.1 Experiments and Discussion on SCFace Database

The SCface database [7] was particularly designed for simulating video-surveillance scenarios, thus it is the most suitable to evaluate the low resolution problem. It consists of 4160 images from 130 people taken in uncontrolled environment. Three different distances, namely 4.20 m (distance1), 2.60 m (distance2), and 1.00 m (distance3); each one with five cameras (cam1, cam2, cam3, cam4, cam5) were used to capture the images. The illumination was uncontrolled and the captured images were different in terms of quality, type and resolution. Example images for distances 2 and 3 appear in Fig. 2.

Fig. 2.
figure 2

Some examples of SCFace database

In order to compare our method with existing approaches we follow the protocol in [19], where the images from distance 3 were normalized to \(48\times 48\) pixels as HR images, while the corresponding LR images of \(16\times 16\) pixels were obtained from distance 2. Besides, 80 subjects were selected for training and the remaining 50 subjects were used for testing. The experiment was repeated 5 times using 200 PCA components, which provided the best results. The results in terms of Recognition Rates are reported in Table 1. The standard deviation is also presented. As it can be seen in Table 1, in general the proposed scheme achieves relatively high and stable recognition rates when compared with other state-of-the-art algorithms reported in [19]. In particular, the best result is obtained using the LDA metric learning, with a significantly higher recognition rate.

Table 1. Recognition rates in the SCFace database

4.2 Experiments and Discussion on LFW Database

In order to corroborate the obtained results on another dataset and to compare the proposed used of metric learning over the dissimilarity space, we conduct experiments on LFW database [18]. It contains 13233 labeled faces from 5749 people. A subset of the database consisting of 3 832 images belonging to 178 subjects was used during the experiments, by selecting the subjects with 8 or more images. The data is challenging, as the faces are detected in the wild, taken from Yahoo! News. The images have different variations such as pose, scale, clothing, expression, focus, resolution and others. Some example images are shown in Fig. 3

Fig. 3.
figure 3

Some examples of LFW database

All images were geometrically normalized by the center of the eyes to the LR of \(16\times 16\) pixels and to the HR of \(48\times 48\) pixels. We randomly divided the data set into two sets for training and testing of equal size five times. In this experiment we compare the standard Euclidean distance to the learned metric in the DS. The obtained results in terms of error rates are shown in Table 2. From the results in Table 2 it can be seen that learning a Mahalanobis metric to replace the Euclidean distance improves the classification in a DS by a great margin.

Table 2. Error rates in the LFW database

5 Conclusions

In this paper we presented the use of metric learning to learn a Mahalanobis distance metric for LRFR in the dissimilarity space. This learned metric enforces objects for the same class to be closer while objects from different classes are pulled apart. Unlike current methods for LR case, which mostly consider the features space, we proposed a new representation space based on dissimilarities between objects and we improved the classification in this space with metric learning. We evaluated our proposal on two challenging datasets. Experiments showed improvements over previously reported methods. Therefore, the improvement of representations based on relational information seems to be a promising research line for future works.