1 Introduction

Object recognition/learning based on visual content information is a fundamental problem in computer vision area. Recently, image set based object recognition approaches have been widely studied and attracted more and more interest. For image set based object recognition, it aims to achieve object recognition/learning problem by using multiple images (or video) that belong to one object.

One problem for image set based object recognition process is how to effectively represent an image set [5, 9, 11, 22, 26]. In recent years, many methods have been proposed for this problem. One kind of popular methods is to use statistical models. These methods usually aim to represent an image set by using some distributions, such as Gaussian, GMM, etc. [24]. Based on these representation, the similarity measurement between two image sets can be computed by metric measurement between distributions [1, 18, 20, 22]. Another kind of methods is to use linear subspace models which aims to represent an image set by using a linear or affine subspace. Based on these representation, one can compute the distance between two image sets by measuring the distance between two subspaces [9, 10, 15, 25]. Recent studies also aim to represent an image set using a nonlinear manifold or several sub-manifolds. Then, they generally use the metric learning method of manifolds to achieve image set recognition/learning tasks [3, 4, 7, 11, 21, 23, 26]. Some other methods have also been proposed [6].

Previous works generally focus on developing a method for image set feature extraction and classification problems. In this paper, we focus on image set itself and propose a robust low-rank representation for image set. Our method is motivated by the following observation. Motivated by recent work on low-rank representation [12, 19], we propose a new low-rank representation, called \(L_1\) norm based robust optimal Mean Principal Component Analysis (L1-MPCA), for image set recovery and representation. The aim of L1-MPCA is to integrate the mean calculation into L1-norm PCA low-rank approximation objective and thus the optimal mean can be obtained to enhance the low-rank approximation and representation. An effective optimization algorithm has been derived to solve the proposed L1-MPCA. Comparing with original observed image sets, L1-MPCA of image sets are generally noiseless and more regular, which significantly encourages the robust learning and recognition process. Experimental results on four datasets demonstrate the effectiveness and robustness of the proposed L1-MPCA methods.

2 Brief Review of Optimal Mean PCA

In this section, we give a brief introduction of Optimal Mean Principal Component Analysis (OMPCA) model [19]. Let \(\mathbf X =(\mathbf x _1,\mathbf x _2,\cdots \mathbf x _n)\in \mathbb {R}^{p\times n}\) be the input data matrix containing the collection of n data column vectors in p dimension space. In image set representation, each column \(\mathbf x _i\) denotes one linearized array of pixels gray levels. The aim of Optimal Mean Principal Component Analysis (OMPCA) [19] is to find the optimal low-dimensional matrices \(\mathbf U =(\mathbf u _1,\mathbf u _2,\cdots \mathbf u _k)\in \mathbb {R}^{p\times k}\), \(\mathbf V =(\mathbf v _1,\mathbf v _2,\cdots \mathbf v _k)\in \mathbb {R}^{n\times k}\) and mean vector \(\mathbf b \in \mathbb {R}^{p}\) by minimizing,

$$\begin{aligned} \min _\mathbf{U ,\mathbf V ,\mathbf b } \quad \sum ^n_{i=1}\Vert \mathbf x _i - \mathbf U {} \mathbf v _i - \mathbf b \Vert _2^2=\Vert \mathbf X - \mathbf UV ^{{\mathrm{T}}} - \mathbf b {} \mathbf 1 ^{{\mathrm{T}}}\Vert ^2_{F} \qquad s.t. \quad \mathbf U {} \mathbf U ^{{\mathrm{T}}}=\mathbf I \end{aligned}$$
(1)

where \(\mathbf 1 =(1,1,\cdots ,1)\in \mathbb {R}^n\). Let \(\mathbf Z =\mathbf UV ^{{\mathrm{T}}} +\mathbf b {} \mathbf 1 ^{{\mathrm{T}}}\), then \(\mathbf Z \) provides a kind of low-rank representation for original input data \(\mathbf X \). It is known that the squared loss function used in the above MPCA is very sensitive to outliers. In order to overcome this problem, Nie et al. [19] also propose a kind of Robust MPCA by using \(L_{2,1}\) norm and solve the optimization problem as

$$\begin{aligned} \min _\mathbf{U ,\mathbf V ,\mathbf b } \quad \sum ^n_{i=1}\Vert \mathbf x _i - \mathbf U {} \mathbf v _i - \mathbf b \Vert _2=\Vert \mathbf X - \mathbf UV ^{{\mathrm{T}}} - \mathbf b {} \mathbf 1 ^{{\mathrm{T}}}\Vert _{2,1} \qquad s.t. \quad \mathbf U {} \mathbf U ^{{\mathrm{T}}}=\mathbf I \end{aligned}$$
(2)

Comparing with Frobenious norm loss function, \(L_{2,1}\)-norm loss function performs robustly w.r.t outliers because it uses a non-squared loss function.

3 \(L_1\)-Norm Based Robust MPCA

The above \(L_{2,1}\)-norm OMPCA is robustness to outliers. However, it is sensitive to the corruptions or large errors existing in each image \(\mathbf x _i\) because of L2 norm loss function for each image data. Our aim in this section is to propose a new kind of robust OMPCA by using \(L_1\) norm loss function instead of \(L_{2,1}\) norm loss function.

Model formulation. Formally, let \(\mathbf X =(\mathbf x _1,\mathbf x _2,...,\mathbf x _n)\in \mathbb {R}^{p\times n}\) be the image set data, our \(L_1\) norm based MPCA (L1-MPCA) is formulated as,

$$\begin{aligned} \min _\mathbf{U ,\mathbf V ,\mathbf b } \quad \sum ^n_{i=1}\Vert \mathbf x _i - \mathbf U {} \mathbf v _i - \mathbf b \Vert _1=\Vert \mathbf X - \mathbf UV ^{{\mathrm{T}}} - \mathbf b {} \mathbf 1 ^{{\mathrm{T}}}\Vert _{1} \qquad s.t. \quad \mathbf U {} \mathbf U ^{{\mathrm{T}}}=\mathbf I \end{aligned}$$
(3)

where \(L_1\) norm loss function is defined as \(\Vert \mathbf A \Vert _1=\sum _i\sum _j | \mathbf A _{ij}|\). It is known that the \(L_1\) norm loss function will make the the proposed L1-MPCA robust to both corruptions noise/large errors and outliers. Note that the above L1-MPCA can be regarded as a natural extension of the tractional L1-PCA model [2, 13] by further removing optimal mean automatically from the input data set \(\mathbf X \).

Optimization. We present an effective updating algorithm to solve L1-MPCA model. Firstly, Eq. (3) can be rewritten equivalently as

$$\begin{aligned} \begin{aligned} \min _\mathbf{U ,\mathbf V ,\mathbf E ,\mathbf b } \quad \Vert \mathbf E \Vert _{1} \quad s.t. \quad \mathbf E = \mathbf X -\mathbf UV ^{{\mathrm{T}}}-\mathbf b1 ^{{\mathrm{T}}}, \mathbf UU ^{{\mathrm{T}}} = \mathbf I \end{aligned} \end{aligned}$$
(4)

We use the Augmented Lagrange Multiplier (ALM) method to solve this problem. ALM solves a sequences of subproblems

$$\begin{aligned}&\min _\mathbf{U ,\mathbf V ,\mathbf E ,\mathbf b } \quad \Vert \mathbf E \Vert _{1} + \mathrm {Tr} \quad {\mathrm {\Omega }^{{\mathrm{T}}}(\mathbf E -\mathbf X +\mathbf UV ^T+\mathbf b1 ^{{\mathrm{T}}})} + \frac{2}{\mu } \Vert \mathbf E -\mathbf X +\mathbf UV ^T+\mathbf b1 ^{{\mathrm{T}}} \Vert ^2_{F}\nonumber \\&\quad \quad s.t. \quad \mathbf U {} \mathbf U ^{{\mathrm{T}}} = \mathbf I \end{aligned}$$
(5)

where \(\mathrm {\Omega }\) is Lagrange multipliers and \(\mu \) is the penalty parameter. There are two major parts of this algorithm, i.e., solving the sub-problem and updating parameters \((\mathrm {\Omega },\mu )\).

First, we rewrite the objective function of Eq. (5) as

$$\begin{aligned} \min _\mathbf{U ,\mathbf V ,\mathbf E ,\mathbf b } \quad \Vert \mathbf E \Vert _{1} + \frac{2}{\mu } \Vert \mathbf E -(\mathbf X -\mathbf UV ^{{\mathrm{T}}}-\mathbf b1 ^{{\mathrm{T}}}+\frac{\mathrm {\Omega }}{\mu }) \Vert _{F}^2 \qquad s.t. \quad \mathbf U {} \mathbf U ^{{\mathrm{T}}} = \mathbf I \end{aligned}$$
(6)

Then, we iteratively solve the following sub-problems until convergence.

  1. (1)

    Solve \(\mathbf U ,\mathbf V ,\mathbf b \) while fixing \(\mathbf E \). The problem becomes

    $$\begin{aligned} \begin{aligned} \min _\mathbf{U ,\mathbf V ,\mathbf b } \Vert (\mathbf X -\mathbf E -\frac{\mathrm {\Omega }}{\mu }) -\mathbf b1 ^{{\mathrm{T}}}- \mathbf UV ^{{\mathrm{T}}}\Vert ^2_{F} \qquad s.t. \quad \mathbf U {} \mathbf U ^{{\mathrm{T}}}=\mathbf I \end{aligned} \end{aligned}$$
    (7)

    This is standard MPCA [19] and can be solved effectively using a closed-form solution.

  2. (2)

    Solve \(\mathbf E \) while fixing \(\mathbf U ,\mathbf V ,\mathbf b \). The problem becomes

    $$\begin{aligned} \begin{aligned} \min _\mathbf E \quad \Vert \mathbf E \Vert _{1} + \frac{2}{\mu }\Vert \mathbf E -(\mathbf X -\mathbf U {} \mathbf V ^{{\mathrm{T}}}-\mathbf b {} \mathbf 1 ^{{\mathrm{T}}} +\frac{\mathrm {\Omega }}{\mu }) \Vert ^2_{F} \end{aligned} \end{aligned}$$
    (8)

    It is well known that, this problem has closed-form solution,

    $$\begin{aligned} \begin{aligned} \mathbf E _{ij} = \mathrm {sign}(\mathbf K _{ij})(|\mathbf K _{ij}| - \dfrac{1}{\mu })_{+}, \quad \mathbf K = \mathbf X -\mathbf U {} \mathbf V ^{{\mathrm{T}}}-\mathbf b {} \mathbf 1 ^{{\mathrm{T}}}+\frac{\mathrm {\Omega }}{\mu } \end{aligned} \end{aligned}$$
    (9)
  3. (3)

    At the end of each ALM iteration, \(\mathrm {\Omega },\mu \) are updated as

    $$\begin{aligned} \begin{aligned}&\mathrm {\Omega } = \mathrm {\Omega } + \mu (\mathbf X -\mathbf U {} \mathbf V ^{{\mathrm{T}}} - \mathbf E ) \\&\mu = \rho \mu \end{aligned} \end{aligned}$$
    (10)

    where \(\rho > 1\).

4 Application: Image Set Representation and Classification

In this section, we apply the proposed L1-MPCA in image set representation and classification tasks. Our image set representation and classification method contains two main steps.

First, given an image set \(\mathbf X =(\mathbf x _1,\mathbf x _2,\cdots \mathbf x _n)\), we first use the proposed L1-MPCA to compute the optimal \(\mathbf U ^*,\mathbf V ^*\) and mean vector \(\mathbf b ^*\). We then obtain the optimal low-rank representation \(\mathbf Z \) as

$$ \mathbf Z =\mathbf X - \mathbf U ^*\mathbf V ^{*{\mathrm{T}}} -\mathbf b ^*\mathbf 1 ^{{\mathrm{T}}} $$

Comparing with the original image set data \(\mathbf X \), the noises of images and outliers in image set \(\mathbf X \) can be well suppressed in its low-rank representation \(\mathbf Z \).

Second, based on low-rank representation \(\mathbf Z \), we can use some image set feature extraction and learning methods such as Covariance Discriminative Learning (CDL) [22], Covariate-relation graph (CRG) [6] and Manifold-Manifold Distance (MMD) [23] to conduct image set classification tasks.

5 Experiments

To evaluate the effectiveness of the proposed L1-MPCA method, we apply it in image set representation and classification tasks. For image set learning methods, we use some recent methods: Covariance Discriminative Learning (CDL) [22], Covariate-relation graph (CRG) [6], Manifold-Manifold Distance (MMD) [23], Set to Set Distance Metric Learning (SSDML) [27] and Canonical Correlations (DCC) [15]. According to [23], MMD method does the subspaces learning with 95% data energy based on PCA. For the discriminative learning method of CDL, we choose PLS to do the learning task. For SSDML method, we set \(\nu = 1, \lambda _1 = 0.001\) and \(\lambda _2 = 0.1\).

5.1 Datasets and Settings

In our experiments, we test our L1-MPCA on four datasets including YouTube Celebrities (YTC) [14], ETH-80 [17], Honda/UCSD [16] and CMU MoBo [8]. In each image set data, we resize all the images into \(20 \times 20\) intensity images. The datasets are described as following.

  • ETH-80 [17] dataset has image sets of 8 categories and each category contains 10 objects with 41 views per object, spaced equally over the viewing hemisphere, for a total of 3280 images. For each subject of this dataset, we randomly choose 5 sets for training and the rest 5 object for testing.

  • YouTube-Celebrities (YTC) [14] dataset contains 1910 video clips of 47 celebrities (actors and politicians), most of the videos are low resolution and highly compression, which leads to noisy, low-quality image frames. Each clip contains hundreds of frames. For this dataset, we randomly chose 3 sets for training and 6 sets for testing.

  • Honda/UCSD [16] dataset consists of 59 video sequences belonging to 20 different persons. Each sequence contains about 400 frames covering large variations. Each individual in our database has at least two videos. For this dataset, we randomly select one sequence for training set and the rest for testing.

  • CMU MoBo [8] dataset has 96 sequences of 24 persons and the sequences are captured from different walking situations inclined walk, and slow walk holding a ball (to inhibit arm swing). Each video further divided into four illumination sets, the first set for training and the rest sets for testing.

Fig. 1.
figure 1

Average accuracies of different methods on four datasets.

5.2 Results Analysis

To evaluate the benefit of the proposed L1-MPCA low-rank representation method, we compare our method with original data and L1-PCA method [2, 13]. Figure 1 summarizes the average classification results on four datasets, respectively. (1) Comparing with original image set \(\mathbf X \), the proposed L1-MPCA method can significantly improve the image set classification results, which clearly demonstrates the desired benefit and effectiveness of the proposed L1-MPCA method on conducting image set representation problem and thus leads to better classification result. (2) The proposed L1-MPCA methods generally performs better than L1-PCA method [2]. This clearly demonstrates the benefit of the proposed L1-MPCA by further considering the optimal mean vector value in low-rank representation.

Table 1. Classification accuracies on ETH80 dataset with different noises.
Table 2. Classification accuracies on YTC dataset with different noises.
Table 3. Classification accuracies on Honda dataset with different noises.
Table 4. Classification accuracies on CMU dataset with different noises.

5.3 Robust to Noise

To evaluate the robustness of L1-MPCA method to the noise possibly appearing in the testing image set data, we randomly add some noise to the image set datasets. Here, we add two kinds of noises including salt & pepper and block noise. For each kind of noise, we add various levels of the noises and test our method on these noise image set data. Tables 1, 2, 3 and 4 show the accuracies of all the traditional methods across different noise level. From the results, we can note that: (1) As the level of noise increasing. Our L1-MPCA method still maintains better performance comparing with the original image set data. This obviously indicates the noise removing ability of the proposed L1-MPCA method. (2) L1-MPCA performs better than L1-PCA method [2], indicating the more robustness of the L1-MPCA method on noise data reconstruction.

6 Conclusion

In this paper, we propose a new method, called \(L_1\) norm Mean PCA (L1-MPCA) model for image set representation and learning problems. L1-MPCA is robust to both noises and outliers, which encourages robust image set learning tasks. An effective update algorithm has been proposed to solve the proposed L1-MPCA model. Experimental results on several datasets show the benefit and robustness of the proposed L1-MPCA method. In our future, we will further consider the manifold structure of data in our L1-MPCA model.