Abstract
Many problems in computer vision area can be formulated as image set representation and classification. One main challenge is that image set data usually contains various kinds of noises and outliers which usually make the recognition/learning tasks of image set more challengeable. In this paper, we propose a new \(L_1\) norm optimal Mean Principal Component Analysis (L1-MPCA) to learn an optimal low-rank representation for image set. Comparing with original observed image set, L1-MPCA based low-rank representation is generally noiseless and thus can encourage more robust learning process. An effective update algorithm has been proposed to solve the proposed L1-MPCA model. Experimental results on several datasets demonstrate the effectiveness and robustness of the proposed L1-MPCA method.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Object recognition/learning based on visual content information is a fundamental problem in computer vision area. Recently, image set based object recognition approaches have been widely studied and attracted more and more interest. For image set based object recognition, it aims to achieve object recognition/learning problem by using multiple images (or video) that belong to one object.
One problem for image set based object recognition process is how to effectively represent an image set [5, 9, 11, 22, 26]. In recent years, many methods have been proposed for this problem. One kind of popular methods is to use statistical models. These methods usually aim to represent an image set by using some distributions, such as Gaussian, GMM, etc. [24]. Based on these representation, the similarity measurement between two image sets can be computed by metric measurement between distributions [1, 18, 20, 22]. Another kind of methods is to use linear subspace models which aims to represent an image set by using a linear or affine subspace. Based on these representation, one can compute the distance between two image sets by measuring the distance between two subspaces [9, 10, 15, 25]. Recent studies also aim to represent an image set using a nonlinear manifold or several sub-manifolds. Then, they generally use the metric learning method of manifolds to achieve image set recognition/learning tasks [3, 4, 7, 11, 21, 23, 26]. Some other methods have also been proposed [6].
Previous works generally focus on developing a method for image set feature extraction and classification problems. In this paper, we focus on image set itself and propose a robust low-rank representation for image set. Our method is motivated by the following observation. Motivated by recent work on low-rank representation [12, 19], we propose a new low-rank representation, called \(L_1\) norm based robust optimal Mean Principal Component Analysis (L1-MPCA), for image set recovery and representation. The aim of L1-MPCA is to integrate the mean calculation into L1-norm PCA low-rank approximation objective and thus the optimal mean can be obtained to enhance the low-rank approximation and representation. An effective optimization algorithm has been derived to solve the proposed L1-MPCA. Comparing with original observed image sets, L1-MPCA of image sets are generally noiseless and more regular, which significantly encourages the robust learning and recognition process. Experimental results on four datasets demonstrate the effectiveness and robustness of the proposed L1-MPCA methods.
2 Brief Review of Optimal Mean PCA
In this section, we give a brief introduction of Optimal Mean Principal Component Analysis (OMPCA) model [19]. Let \(\mathbf X =(\mathbf x _1,\mathbf x _2,\cdots \mathbf x _n)\in \mathbb {R}^{p\times n}\) be the input data matrix containing the collection of n data column vectors in p dimension space. In image set representation, each column \(\mathbf x _i\) denotes one linearized array of pixels gray levels. The aim of Optimal Mean Principal Component Analysis (OMPCA) [19] is to find the optimal low-dimensional matrices \(\mathbf U =(\mathbf u _1,\mathbf u _2,\cdots \mathbf u _k)\in \mathbb {R}^{p\times k}\), \(\mathbf V =(\mathbf v _1,\mathbf v _2,\cdots \mathbf v _k)\in \mathbb {R}^{n\times k}\) and mean vector \(\mathbf b \in \mathbb {R}^{p}\) by minimizing,
where \(\mathbf 1 =(1,1,\cdots ,1)\in \mathbb {R}^n\). Let \(\mathbf Z =\mathbf UV ^{{\mathrm{T}}} +\mathbf b {} \mathbf 1 ^{{\mathrm{T}}}\), then \(\mathbf Z \) provides a kind of low-rank representation for original input data \(\mathbf X \). It is known that the squared loss function used in the above MPCA is very sensitive to outliers. In order to overcome this problem, Nie et al. [19] also propose a kind of Robust MPCA by using \(L_{2,1}\) norm and solve the optimization problem as
Comparing with Frobenious norm loss function, \(L_{2,1}\)-norm loss function performs robustly w.r.t outliers because it uses a non-squared loss function.
3 \(L_1\)-Norm Based Robust MPCA
The above \(L_{2,1}\)-norm OMPCA is robustness to outliers. However, it is sensitive to the corruptions or large errors existing in each image \(\mathbf x _i\) because of L2 norm loss function for each image data. Our aim in this section is to propose a new kind of robust OMPCA by using \(L_1\) norm loss function instead of \(L_{2,1}\) norm loss function.
Model formulation. Formally, let \(\mathbf X =(\mathbf x _1,\mathbf x _2,...,\mathbf x _n)\in \mathbb {R}^{p\times n}\) be the image set data, our \(L_1\) norm based MPCA (L1-MPCA) is formulated as,
where \(L_1\) norm loss function is defined as \(\Vert \mathbf A \Vert _1=\sum _i\sum _j | \mathbf A _{ij}|\). It is known that the \(L_1\) norm loss function will make the the proposed L1-MPCA robust to both corruptions noise/large errors and outliers. Note that the above L1-MPCA can be regarded as a natural extension of the tractional L1-PCA model [2, 13] by further removing optimal mean automatically from the input data set \(\mathbf X \).
Optimization. We present an effective updating algorithm to solve L1-MPCA model. Firstly, Eq. (3) can be rewritten equivalently as
We use the Augmented Lagrange Multiplier (ALM) method to solve this problem. ALM solves a sequences of subproblems
where \(\mathrm {\Omega }\) is Lagrange multipliers and \(\mu \) is the penalty parameter. There are two major parts of this algorithm, i.e., solving the sub-problem and updating parameters \((\mathrm {\Omega },\mu )\).
First, we rewrite the objective function of Eq. (5) as
Then, we iteratively solve the following sub-problems until convergence.
-
(1)
Solve \(\mathbf U ,\mathbf V ,\mathbf b \) while fixing \(\mathbf E \). The problem becomes
$$\begin{aligned} \begin{aligned} \min _\mathbf{U ,\mathbf V ,\mathbf b } \Vert (\mathbf X -\mathbf E -\frac{\mathrm {\Omega }}{\mu }) -\mathbf b1 ^{{\mathrm{T}}}- \mathbf UV ^{{\mathrm{T}}}\Vert ^2_{F} \qquad s.t. \quad \mathbf U {} \mathbf U ^{{\mathrm{T}}}=\mathbf I \end{aligned} \end{aligned}$$(7)This is standard MPCA [19] and can be solved effectively using a closed-form solution.
-
(2)
Solve \(\mathbf E \) while fixing \(\mathbf U ,\mathbf V ,\mathbf b \). The problem becomes
$$\begin{aligned} \begin{aligned} \min _\mathbf E \quad \Vert \mathbf E \Vert _{1} + \frac{2}{\mu }\Vert \mathbf E -(\mathbf X -\mathbf U {} \mathbf V ^{{\mathrm{T}}}-\mathbf b {} \mathbf 1 ^{{\mathrm{T}}} +\frac{\mathrm {\Omega }}{\mu }) \Vert ^2_{F} \end{aligned} \end{aligned}$$(8)It is well known that, this problem has closed-form solution,
$$\begin{aligned} \begin{aligned} \mathbf E _{ij} = \mathrm {sign}(\mathbf K _{ij})(|\mathbf K _{ij}| - \dfrac{1}{\mu })_{+}, \quad \mathbf K = \mathbf X -\mathbf U {} \mathbf V ^{{\mathrm{T}}}-\mathbf b {} \mathbf 1 ^{{\mathrm{T}}}+\frac{\mathrm {\Omega }}{\mu } \end{aligned} \end{aligned}$$(9) -
(3)
At the end of each ALM iteration, \(\mathrm {\Omega },\mu \) are updated as
$$\begin{aligned} \begin{aligned}&\mathrm {\Omega } = \mathrm {\Omega } + \mu (\mathbf X -\mathbf U {} \mathbf V ^{{\mathrm{T}}} - \mathbf E ) \\&\mu = \rho \mu \end{aligned} \end{aligned}$$(10)where \(\rho > 1\).
4 Application: Image Set Representation and Classification
In this section, we apply the proposed L1-MPCA in image set representation and classification tasks. Our image set representation and classification method contains two main steps.
First, given an image set \(\mathbf X =(\mathbf x _1,\mathbf x _2,\cdots \mathbf x _n)\), we first use the proposed L1-MPCA to compute the optimal \(\mathbf U ^*,\mathbf V ^*\) and mean vector \(\mathbf b ^*\). We then obtain the optimal low-rank representation \(\mathbf Z \) as
Comparing with the original image set data \(\mathbf X \), the noises of images and outliers in image set \(\mathbf X \) can be well suppressed in its low-rank representation \(\mathbf Z \).
Second, based on low-rank representation \(\mathbf Z \), we can use some image set feature extraction and learning methods such as Covariance Discriminative Learning (CDL) [22], Covariate-relation graph (CRG) [6] and Manifold-Manifold Distance (MMD) [23] to conduct image set classification tasks.
5 Experiments
To evaluate the effectiveness of the proposed L1-MPCA method, we apply it in image set representation and classification tasks. For image set learning methods, we use some recent methods: Covariance Discriminative Learning (CDL) [22], Covariate-relation graph (CRG) [6], Manifold-Manifold Distance (MMD) [23], Set to Set Distance Metric Learning (SSDML) [27] and Canonical Correlations (DCC) [15]. According to [23], MMD method does the subspaces learning with 95% data energy based on PCA. For the discriminative learning method of CDL, we choose PLS to do the learning task. For SSDML method, we set \(\nu = 1, \lambda _1 = 0.001\) and \(\lambda _2 = 0.1\).
5.1 Datasets and Settings
In our experiments, we test our L1-MPCA on four datasets including YouTube Celebrities (YTC) [14], ETH-80 [17], Honda/UCSD [16] and CMU MoBo [8]. In each image set data, we resize all the images into \(20 \times 20\) intensity images. The datasets are described as following.
-
ETH-80 [17] dataset has image sets of 8 categories and each category contains 10 objects with 41 views per object, spaced equally over the viewing hemisphere, for a total of 3280 images. For each subject of this dataset, we randomly choose 5 sets for training and the rest 5 object for testing.
-
YouTube-Celebrities (YTC) [14] dataset contains 1910 video clips of 47 celebrities (actors and politicians), most of the videos are low resolution and highly compression, which leads to noisy, low-quality image frames. Each clip contains hundreds of frames. For this dataset, we randomly chose 3 sets for training and 6 sets for testing.
-
Honda/UCSD [16] dataset consists of 59 video sequences belonging to 20 different persons. Each sequence contains about 400 frames covering large variations. Each individual in our database has at least two videos. For this dataset, we randomly select one sequence for training set and the rest for testing.
-
CMU MoBo [8] dataset has 96 sequences of 24 persons and the sequences are captured from different walking situations inclined walk, and slow walk holding a ball (to inhibit arm swing). Each video further divided into four illumination sets, the first set for training and the rest sets for testing.
5.2 Results Analysis
To evaluate the benefit of the proposed L1-MPCA low-rank representation method, we compare our method with original data and L1-PCA method [2, 13]. Figure 1 summarizes the average classification results on four datasets, respectively. (1) Comparing with original image set \(\mathbf X \), the proposed L1-MPCA method can significantly improve the image set classification results, which clearly demonstrates the desired benefit and effectiveness of the proposed L1-MPCA method on conducting image set representation problem and thus leads to better classification result. (2) The proposed L1-MPCA methods generally performs better than L1-PCA method [2]. This clearly demonstrates the benefit of the proposed L1-MPCA by further considering the optimal mean vector value in low-rank representation.
5.3 Robust to Noise
To evaluate the robustness of L1-MPCA method to the noise possibly appearing in the testing image set data, we randomly add some noise to the image set datasets. Here, we add two kinds of noises including salt & pepper and block noise. For each kind of noise, we add various levels of the noises and test our method on these noise image set data. Tables 1, 2, 3 and 4 show the accuracies of all the traditional methods across different noise level. From the results, we can note that: (1) As the level of noise increasing. Our L1-MPCA method still maintains better performance comparing with the original image set data. This obviously indicates the noise removing ability of the proposed L1-MPCA method. (2) L1-MPCA performs better than L1-PCA method [2], indicating the more robustness of the L1-MPCA method on noise data reconstruction.
6 Conclusion
In this paper, we propose a new method, called \(L_1\) norm Mean PCA (L1-MPCA) model for image set representation and learning problems. L1-MPCA is robust to both noises and outliers, which encourages robust image set learning tasks. An effective update algorithm has been proposed to solve the proposed L1-MPCA model. Experimental results on several datasets show the benefit and robustness of the proposed L1-MPCA method. In our future, we will further consider the manifold structure of data in our L1-MPCA model.
References
Arandjelovic, O., Shakhnarovich, G., Fisher, J., Cipolla, R., Darrell, T.: Face recognition with image sets using manifold density divergence. In: 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 581–588. IEEE (2005)
Cao, Y., Jiang, B., Chen, Z., Tang, J., Luo, B.: Low-rank image set representation and classification. In: Liu, C.-L., Hussain, A., Luo, B., Tan, K.C., Zeng, Y., Zhang, Z. (eds.) BICS 2016. LNCS (LNAI), vol. 10023, pp. 321–330. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49685-6_29
Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2567–2573. IEEE (2010)
Chen, S., Sanderson, C., Harandi, M.T., Lovell, B.C.: Improved image set classification via joint sparse approximated nearest subspaces. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 452–459. IEEE (2013)
Chen, S., Wiliem, A., Sanderson, C., Lovell, B.C.: Matching image sets via adaptive multi convex hull. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1074–1081. IEEE (2014)
Chen, Z., Jiang, B., Tang, J., Luo, B.: Image set representation and classification with covariate-relation graph. In: IEEE Conference on Asian Conference and Pattern Recognition (ACPR), pp. 750–754. IEEE (2015)
Cui, Z., Shan, S., Zhang, H., Lao, S., Chen, X.: Image sets alignment for video-based face recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2626–2633. IEEE (2012)
Gross, R., Shi, J.: The CMU motion of body (MoBo) database. Technical report CMU-RI-TR-01-18, Robotics Institute, Carnegie Mellon University (2001)
Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 376–383. ACM (2008)
Harandi, M.T., Sanderson, C., Shirazi, S., Lovell, B.C.: Graph embedding discriminant analysis on grassmannian manifolds for improved image set matching. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2705–2712. IEEE (2011)
Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 121–128. IEEE (2011)
Jiang, B., Ding, C., Luo, B., Tang, J.: Graph-laplacian PCA: closed-form solution and robustness. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3492–3498. IEEE (2013)
Ke, Q., Kanade, T.: Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming. In 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 739–746. IEEE (2005)
Kim, M., Kumar, S., Pavlovic, V., Rowley, H.: Face tracking and recognition with visual constraints in real-world videos. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)
Kim, T.-K., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1005–1018 (2007)
Lee, K.-C., Ho, J., Yang, M.-H., Kriegman, D.: Video-based face recognition using probabilistic appearance manifolds. In: 2003 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. I–313. IEEE (2003)
Leibe, B., Schiele, B.: Analyzing appearance and contour based methods for object categorization. In: 2003 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. II–409. IEEE (2003)
Lu, J., Wang, G., Moulin, P.: Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 329–336. IEEE (2013)
Nie, F., Yuan, J., Huang, H.: Optimal mean robust principal component analysis. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp. 1062–1070 (2014)
Shakhnarovich, G., Fisher, J.W., Darrell, T.: Face recognition from long-term observations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002 Part III. LNCS, vol. 2352, pp. 851–865. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47977-5_56
Wang, R., Chen, X.: Manifold discriminant analysis. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 429–436. IEEE (2009)
Wang, R., Guo, H., Davis, L.S., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2496–2503. IEEE (2012)
Wang, R., Shan, S., Chen, X., Gao, W.: Manifold-manifold distance with application to face recognition based on image set. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)
Wang, W., Wang, R., Huang, Z., Shan, S., Chen, X.: Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2048–2057. IEEE (2015)
Yamaguchi, O., Fukui, K., Maeda, K.-I.: Face recognition using temporal image sequence. In: 1998 Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 318–323. IEEE (1998)
Yang, M., Zhu, P., Van Gool, L., Zhang, L.: Face recognition based on regularized nearest points between image sets. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–7. IEEE (2013)
Zhu, P., Zhang, L., Zuo, W., Zhang, D.: From point to set: extend the learning of distance metrics. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2664–2671. IEEE (2013)
Acknowledgment
This work is supported by the National Natural Science Foundation of China (61602001, 61472002, 61671018); Natural Science Foundation of Anhui Province (1708085QF139); Natural Science Foundation of Anhui Higher Education Institutions of China (KJ2016A020); Co-Innovation Center for Information Supply & Assurance Technology, Anhui University The Open Projects Program of National Laboratory of Pattern Recognition.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cao, Y., Jiang, B., Tang, J., Luo, B. (2017). Image Set Representation with \(L_1\)-Norm Optimal Mean Robust Principal Component Analysis. In: Zhao, Y., Kong, X., Taubman, D. (eds) Image and Graphics. ICIG 2017. Lecture Notes in Computer Science(), vol 10667. Springer, Cham. https://doi.org/10.1007/978-3-319-71589-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-71589-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71588-9
Online ISBN: 978-3-319-71589-6
eBook Packages: Computer ScienceComputer Science (R0)