1 Introduction

Many problems in image understanding involve some kind of dimensionality reduction [16]. Recently, a lot of dimensionality reduction methods have been proposed such as PCA, LDA, LPP [1], Isomap [7], LLE [8], etc. Among all these methods, PCA is a powerful and popular linear technique to extract lower manifold structure from high dimensional data, which has been widely used in pattern recognition such as face recognition, object recognition, etc. PCA seeks the optimal combination of the input coordinates which reduces the reconstruction error of the input data to form a low dimensional subspace. The corresponding new coordinates are called Principal Vectors. It is often the case that only a small number of the important Principal Vectors is good enough to represent the original data and can furthermore reduce the noise that is induced by the unimportant Principal Vector. PCA provides an efficient way to compress the data with minimal information loss using the eigenvalue decomposition of the data covariance matrix. In fact, the principal vectors are uncorrelated and form the closest linear subspace to the data, which is useful in subsequent statistical analysis.

A lot of variant PCA have been proposed to modify the performance of PCA. Alexandre and Aspremont [9] proposed DSPCA which was based on relaxing a hard cardinality cap constraint with a convex approximation. In [10], Ron Zass and Amnon Shashua proposed a nonnegtive sparse PCA to capture the nonnegtive and sparseness nature of the real world. What’s more, as the real world observations are often corrupted by noise, the principal vectors might not be the ones we desired. Hence, people tried to make some efforts to make PCA be robust to the noisy observations [1113]. For examples, Candes [11] proposed to decompose the observations into a low rank matrix and a noise item, which would make the model be robust to corruptions. And Goes [13] proposed three stochastic approximation algorithms for robust PCA which have smaller storage requirements and lower runtime complexity. Because PCA and its variants are linear transformation of their original space, they cannot capture the nonlinear structure of the data. However, some kind of data lies in the nonlinear structure subspace [1]. To solve this problem, Bernhard Scholkopf et al. [14, 15] proposed a nonlinear form of PCA using kernel method, which was called Kernal PCA (KPCA). KPCA maps the original feature space into a high dimensional feature space and seeks the principle vectors in the mapped space. It uses “Kernel Trick” to solve the problem. It has been proved that KPCA outperforms PCA in pattern recognition problems with same number of principle vectors and the performance can be furthermore improved using more components than PCA. However, KPCA is suffered from the memory problem and computational efficiency problem in the situation when the number of training sample is large [16]. To solve the problem, M. Tippings [17] proposed to select a subset of the training samples to approximate the covariance matrix using a maximum likelihood approach. And Sanparith Marukatat also discovered the problem and proposed to use kernel K-means and preimage reconstruction algorithms to solve the problem. What’s more, Honeine [18] proposed an online version of Kernel PCA to deal with large scale dataset. Another drawback for KPCA is that it fails to consider the intrinsic geometric structure of the data. The only objective function of KPCA and PCA is to reduce the reconstruction error of the data without considering the neighborhood relationship preserving constraint. But it is a natural and reasonable assumption that a good projection should map two data points which are close to each other in the original space into two points also close in the projected feature space [14]. However, KPCA does not take this constraint into consideration explicitly.

In this paper, we aim to solve the problem of KPCA which does not consider the intrinsic geometric structure of the data and propose a novel kernel PCA which preserves the locality constraint relationship in the original feature space using the graph of Laplacian [1, 19] which incorporates the neighborhood relationship of the data. We call this novel method Locality Preserving Kernel PCA (LPKPCA).

This paper is organized as follows. The related works about locality preserving constraint are introduced in Sect. 2. Then a brief review about PCA and KPCA is given in Sect. 3. In Sect. 4 the new objective function and the derivation for LPKPCA are illustrated in details. In Sect. 5 the experiment results are shown to compare the performance between KPCA and LPKPCA on several datasets including ORL face dataset, Yale Face Dataset B and Scene 15 Dataset. Finally, in Sect. 6 a conclusion is given to summarize this paper and point out the future work.

2 Related Works

The concept of locality preserving dimensionality reduction can be traced back to [7, 8]. The locality constraint requires the dimensionality reduction projection to preserve the neighborhood relationship, which has been proved to be a very reasonable assumption [1, 20, 21]. In [19], Mikhail Belkin and Partha Niyogi proposed to use laplacian eigenmap to find the low dimensional embedding of the data which lies in the original high dimensional feature space. X. He, et al. [1] extended this conception and proposed Locality Preserving Projection (LPP) to find the optimal linear approximation of to the eigenfunctions to the Laplace Beltrami operator on the manifold. Although LPP is a linear transformation, it can capture the intrinsic structure embedded in the data. Following the LPP and the spirit of locality constraint, a lot of new dimensionality reduction methods have been proposed recently. In [20], Deng Cai, et al. proposed to add the locality constraint in to Nonnegtive Matrix Factorization and propose Locality Preserving Nonnegative Matrix Factorization (LPNMF) to improve the performance of large high dimensional database. Quanquan Gu, et al. [21] also focused on the locality preserving property and added it into Weighted Maximum Margin Criterion (WMMC) for text classification, which was called Local Relevance Weighted Maximum Margin Criterion for Text Classification (LRWMMC). In image classification, sparse coding has been proved to be a successful coding method [22]. However, Wang et al. [6] argued that the locality preserving constraint was more natural than sparseness constraint and propose Linear Locality Coding (LLC) for coding, which was efficient in coding and achieved the state-of-the-art in image classification.

Inspired by all these works above, we intend to incorporate the locality constraint into KPCA to obtain the optimal dimensionality reduction projection which reduces the reconstruction error and preserves the locality constraint simultaneously. To formalize the locality preserving constraint, a neighborhood graph \(\mathbf {W}\) is built and a Laplician Matrix \(\mathbf {L}\) is constructed based on the graph. We add the Laplician Matrix \(\mathbf {L}\) into the objective function of KPCA and maximize the new objective function using “kernel trick”. More details will be illustrated in Sect. 4.

3 A Brief Review of PCA and Kernel PCA

PCA and KPCA is widely used in image understanding and pattern recognition. In this section, a brief review of PCA and KPCA is given in Sects. 3.1 and 3.2, respectively.

3.1 PCA

The aim of PCA is to seek the optimal orthogonal bases of original space to reconstruct the input samples in order to minimize the reconstruction error and compress the data.

Suppose \(\mathbf {X} = [{\mathbf {x}_1},{\mathbf {x}_2},\ldots ,{\mathbf {x}_N}] \in {\mathbb {R}^{D \times N}}\) be the centered training set, where D is the dimensionality of the original feature and N is the number of training samples. Let \(\mathbf {U} = \{ {\mathbf {u}_1},{\mathbf {u}_2}, \ldots ,{\mathbf {u}_\infty }\}\) be the complete orthogonal set, where

$$\begin{aligned} \mathbf {u}_i^T{\mathbf {u}_j} = \left\{ {\begin{array}{*{20}{c}} {1,\mathrm{{ }}j = i} \\ {0,\mathrm{{ }}j \ne i} \\ \end{array}} \right. \end{aligned}$$
(1)

Thus each sample \(\mathbf {x}\) can be represented as

$$\begin{aligned} \mathbf {x} = \sum \limits _{j = 1}^\infty {{c_j}{\mathbf {u}_j}} \end{aligned}$$
(2)

If we only adopt a subset of \(\mathbf {U}\) to approximate \(\mathbf {x}\), which is denoted as \(\hat{\mathbf {U}} = \{ {\mathbf {u}_1},{\mathbf {u}_2},\ldots ,{\mathbf {u}_d}\}\), the approximated feature \(\hat{\mathbf {x}}\) can be expressed as

$$\begin{aligned} \hat{\mathbf {x}} = \sum \limits _{j = 1}^d {{c_j}{\mathbf {u}_j}} \end{aligned}$$
(3)

As a result, the expected reconstruction error \(\xi \) can be represented as

$$\begin{aligned} \xi = E[{(\mathbf {x} - \hat{\mathbf {x}})^T}(\mathbf {x} - \hat{\mathbf {x}})] \end{aligned}$$
(4)

According to (1)–(3), it can be rewritten as

$$\begin{aligned} \xi = E[\sum \limits _{j = d + 1}^\infty {c_j^2}] \end{aligned}$$
(5)

Because \({c_j} = \mathbf {u}_j^T\mathbf {x}\), we get

$$\begin{aligned} \xi = E[\sum \limits _{j = d + 1}^\infty {\mathbf {u}_j^T\mathbf {x}{\mathbf {x}^T}{\mathbf {u}_j}}] \end{aligned}$$
(6)

So to minimize \(\xi \), the objective function can be formulated as

$$\begin{aligned} \hat{\mathbf {U}}= & {} \mathop {\arg \max }\limits _{\hat{\mathbf {U}}} (\frac{1}{2}||{\hat{\mathbf {U}}^T}\mathbf {X}||_F^2)\\&s.t.{\hat{\mathbf {U}}^T}\hat{\mathbf {U}} = \mathbf {I}\nonumber \end{aligned}$$
(7)

According to the method of Lagrange Multiplier, the optimal \(\hat{\mathbf {U}}\) can be obtained by the eigenvalue decomposition of \(\mathbf {X}{\mathbf {X}^T}\).

3.2 Kernel PCA

As is mentioned in section I, traditional PCA only captures the linear embedding relationship of the data, however many data in reality lies in the non-linear embedding space. To dig out the non-linear relationship of the data, Kernel PCA (KPCA) is proposed by Bernhard Scholkopf [15]. It is proved that KPCA performs better than PCA in many problems.

Specifically, suppose \({\varPhi }\) be a mapping from original feature space to kernel space which satisfies the Mercer Condition. Thus the inner product of the mapped features \({\varPhi }{\mathbf (x)}\), \({\varPhi }{\mathbf (y)}\) can be represented as

$$\begin{aligned} \kappa (\mathbf {x},\mathbf {y}) = {\varPhi }{(\mathbf {x})^T}{\varPhi }(\mathbf {y}) \end{aligned}$$
(8)

Thus the kernel matrix (Gram Matrix) can be represented as

$$\begin{aligned} \mathbf {K} = [\kappa ({\mathbf {x}_i},{\mathbf {x}_j})],\mathrm{{ }}i,j = 1,2,\ldots ,N \end{aligned}$$
(9)

And the centered kernel matrix \(\hat{\mathbf {K}}\) is

$$\begin{aligned} \hat{\mathbf {K}} = \mathbf {K} - {\mathbf {E}_N}\mathbf {K} - \mathbf {K}{\mathbf {E}_N} + {\mathbf {E}_N}\mathbf {K}{\mathbf {E}_N} \end{aligned}$$
(10)

where \(\mathbf {E}_N\) is a \(N\times N\) matrix with all elements equals to \(\frac{1}{N}\).

Using “Kernel Trick” to seek the optimal orthogonal bases in the mapped feature space, which minimizes the reconstruction error, the transformed representation of data \(\mathbf {x}\) can be expressed as

$$\begin{aligned} \mathbf {y} = {\mathbf {Q}^T}\kappa (\mathbf {X},\mathbf {x}) \end{aligned}$$
(11)

where \(\mathbf {Q} = [{\mathbf {\alpha } _1},{\mathbf {\alpha } _2},\ldots ,{\mathbf {\alpha } _D}]\) be the top D eigenvectors of the centered kernel matrix \(\hat{\mathbf {K}}\) divided by the square root of the corresponding eigenvalues.

It can be seen that KPCA obtains the linear transformation in a high dimensional kernel space to minimize the reconstruction error using “Kernel Trick”, which may be a nonlinear transformation in the original space. Thus it can capture the nonlinear relationship in the embedded data space. KPCA is efficient and stable, and is widely used in many areas of signal processing to which the dimensionality reduction is applied.

However, in the derivation of KPCA, it doesn’t consider the neighborhood relationship preserving constraint, which is now proven a very important constraint in many related works [6]. It is a very natural and reasonable assumption and we intend to add it into KPCA for better performance.

4 Locality Preserving Kernel PCA

In this section, the mathematical derivation is shown in details.

The objective function of KPCA is

$$\begin{aligned} \hat{\mathbf {U}}= & {} \mathop {\arg \max }\limits _{\hat{\mathbf {U}}} (\frac{1}{2}||{\hat{\mathbf {U}}^T}\hat{\varPhi }(\mathbf {X})||_F^2)\\&s.t.\,\,{\hat{\mathbf {U}}^T}\hat{\mathbf {U}} = \mathbf {I}\nonumber \end{aligned}$$
(12)

where \({\varPhi }(\mathbf {X})\) is the zero mean collection of the features in the kernel space. To add the locality constraint into the objective function of KPCA, we first model the neighborhood relationship in the original feature [1] \(\mathbf {W}^{N\times N}\), where

$$\begin{aligned} {\mathbf {w}_{ij}} = {{\left\{ \begin{array}{ll} 1,&{}\!\!\!\!\mathrm{{if}}{\,\mathbf {x}_j\,}\mathrm{{is\,\,among\,\,the\,\,k\,\,neighbors\,\,of}}{\,\,\mathbf {x}_i} \\ 0,&{} \quad \quad \quad \quad \quad \!\!\!\!\!\!\! \mathrm{otherwise} \\ \end{array}\right. }} \end{aligned}$$
(13)

Denoting \({\mathbf {y}_i} = {\mathbf {U}^T}{\varPhi }({\mathbf {x}_i})\) as the feature in the transformed space, the locality constraint can be represented as

$$\begin{aligned} R = \sum \limits _{i,j}^{} {||{\mathbf {y}_i} - {\mathbf {y}_j}||_F^2{\mathbf {w}_{ij}}} \end{aligned}$$
(14)

Thus the locality constraint can be added into (12) as

$$\begin{aligned} \hat{\mathbf {U}}= & {} \mathop {\arg \max }\limits _{\hat{\mathbf {U}}} (\frac{1}{2}||{\hat{\mathbf {U}}^T}\hat{\varPhi }(\mathbf {X})||_F^2 - \frac{\lambda }{2}\sum \limits _{i,j}^{} {||{\mathbf {y}_i} - {\mathbf {y}_j}||_F^2{\mathbf {w}_{ij}}} ) \\&s.t.\,\,{\hat{\mathbf {U}}^T}\hat{\mathbf {U}} = \mathbf {I}\nonumber \end{aligned}$$
(15)

where \(\lambda \) is a tradeoff between the reconstruction error and the preservation of locality, bigger \(\lambda \) would increase the credibility of locality. Laplacian Matrix \(\mathbf {L}\) is defined as \(\mathbf {L} = \mathbf {D}-\mathbf {W}\) [29] with \(\mathbf {D}\) is the diagonal matrix \({\mathbf {D}_{ii}} = \sum \limits _j^{} {{\mathbf {w}_{ij}}}\).

Using the Laplacian Matrix \(\mathbf {L}\), the objective function of LPKPCA can be rewritten as

$$\begin{aligned} \hat{\mathbf {U}}= & {} \mathop {\arg \max }\limits _{\hat{\mathbf {U}}} (\frac{1}{2}||{\hat{\mathbf {U}}^T}\hat{\varPhi }(\mathbf {X})||_F^2 - \lambda tr(\mathbf {YL}{\mathbf {Y}^T}))\\&s.t.\,\,{\hat{\mathbf {U}}^T}\hat{\mathbf {U}} = \mathbf {I}\nonumber \end{aligned}$$
(16)

where \(\mathbf {Y} = \{ {\mathbf {y}_1},{\mathbf {y}_2},\ldots ,{\mathbf {y}_N}\} \) and tr() is the trace of the matrix.

To obtain the optimal orthogonal bases \(\hat{\mathbf {U}}\), the objective function (16) can be rewritten as

$$\begin{aligned} \hat{\mathbf {U}}= & {} \mathop {\arg \max }\limits _{\hat{\mathbf {U}}} (tr({\hat{\mathbf {U}}^T}(\hat{\varPhi }(\mathbf {X})\hat{\varPhi }{(\mathbf {X})^T}- \lambda \hat{\varPhi }(\mathbf {X})\mathbf {L}\hat{\varPhi }{(\mathbf {X})^T})\hat{\mathbf {U}})) \end{aligned}$$
(17)

Thus for each \(\mathbf {u}\) in \(\hat{\mathbf {U}}\), it satisfies the following objective function

$$\begin{aligned} \!\!\!\!\!\!&\!\!\!\!\!\!\mathop {\arg \min }\limits _u ({\mathbf {u}^T}(\hat{\varPhi }(\mathbf {X})\hat{\varPhi }{(\mathbf {X})^T} - \lambda \hat{\varPhi }(\mathbf {X})\mathbf {L}\hat{\varPhi }{(\mathbf {X})^T})\mathbf {u})\\&s.t.\,\,{{\mathbf {u}}^T}{\mathbf {u}} = 1\nonumber \end{aligned}$$
(18)

According to Lagrange Multiplier method, we get

$$\begin{aligned} f(\mathbf {u},\mu )= & {} {\mathbf {u}^T}(\hat{\varPhi }(\mathbf {X})\hat{\varPhi }{(\mathbf {X})^T} - \lambda \hat{\varPhi }(\mathbf {X})\mathbf {L}\hat{\varPhi }{(\mathbf {X})^T})\mathbf {u} + \mu ({\mathbf {u}^T}\mathbf {u} - 1) \end{aligned}$$
(19)

then

$$\begin{aligned} \frac{{\partial f}}{{\partial \mathbf {u}}} = 2\left( {\hat{\varPhi }(\mathbf {X})\hat{\varPhi }{{(\mathbf {X})}^T} - \lambda \hat{\varPhi }(\mathbf {X})\mathbf {L}\hat{\varPhi }{{(\mathbf {X})}^T}} \right) \mathbf {u} + 2\mu \mathbf {u} \end{aligned}$$
(20)

Let \(\frac{{\partial f}}{{\partial \mathbf {u}}} = 0\), it is easy to see that \(\mathbf {u}\) is the eigenvector of \(\lambda \hat{\varPhi }(\mathbf {X})\mathbf {L}\hat{\varPhi }{(\mathbf {X})^T} - \hat{\varPhi }(\mathbf {X})\hat{\varPhi }{(\mathbf {X})^T}\).

Denote \({\mathbf {S}^{\varPhi }} = \hat{\varPhi }(\mathbf {X})(\lambda \mathbf {L} - \mathbf {I})\hat{\varPhi }{(\mathbf {X})^T}\), we get

$$\begin{aligned} \mu \mathbf {u}= & {} {\mathbf {S}^{\varPhi }}\mathbf {u}\nonumber \\= & {} \hat{\varPhi }(\mathbf {X})(\lambda \mathbf {L} - \mathbf {I})\hat{\varPhi }{(\mathbf {X})^T}\mathbf {u}\nonumber \\= & {} \hat{\varPhi }(\mathbf {X})(\lambda \mathbf {L} - \mathbf {I}){\mathbf {\alpha }} \end{aligned}$$
(21)

where we define \(\mathbf {\alpha } \) as \(\hat{\varPhi }{\mathrm{{(}}X\mathrm{{)}}^T}u\).

Thus,

$$\begin{aligned} \mathbf {u} \propto \hat{\varPhi }(\mathbf {X})(\lambda \mathbf {L} - \mathbf {I}){\mathbf {\alpha }} \end{aligned}$$
(22)

Hence, according to (21) and (22), we get

$$\begin{aligned} \hat{\mathbf { K}}(\lambda \mathbf {L}-\mathbf {I}){\mathbf {\alpha }} = \mu {\mathbf {\alpha }} \end{aligned}$$
(23)

Thus \(\mathbf {\alpha }\) is the eigenvector of \(\hat{\mathbf {K}}(\lambda \mathbf {L}-\mathbf {I})\). This \(\mathbf {\alpha }\) is then divided by a factor \({\omega }\) to satisfy the constraint \({{\mathbf {u}}^T}{\mathbf {u}} = 1\) in Eq. (18). Then the projected feature \(\mathbf {y}\) can be represented as

$$\begin{aligned} \mathbf {y}= & {} {\mathbf {U}^T}{\varPhi }(\mathbf {x}) \nonumber \\= & {} {\mathbf {Q}^T}\kappa (\mathbf {X},\mathbf {x}) \end{aligned}$$
(24)

where \(\mathbf {Q} = {[{\mathbf {\alpha } _1},{\mathbf {\alpha } _2},\ldots ,{\mathbf {\alpha } _d}]_{N \times d}}\), and \(\kappa (\mathbf {X},\mathbf {x}) = [\kappa ({\mathbf {X}_i},\mathbf {x})], i = 1,2,\ldots ,N\). The pseudo code is illustrated in Algorithm 1.

figure a

Our LPKPCA can be interpreted as a novel KPCA which captures the intrinsic geometry structure in the data simultaneously. The locality preserving constraint can improve the performance as is illustrated in [6], when the data lies in a low dimensional Riemannian space. Furthermore, our LPKPCA only adds a little bit computation burdens when constructing the neighborhood graph.

5 Experiments and Results

In this section, we compare the performance of LPKPCA and KPCA on three datasets: ORL dataset [23], Yale Face Database B [24], and Scene 15 dataset [25]. The performance will be judged by average accuracy,

$$\begin{aligned} Accuracy = \frac{1}{C}\sum \limits _{i = 1}^{C} {{p_i}} \end{aligned}$$
(25)

where

$$\begin{aligned} {p_i} = \frac{{{{Number\; of\; True\; Positives\; in\; Class\; i}}}}{{{{Total\; Number\; of\; samples\; in\; Class\; i}}}} \end{aligned}$$
(26)

C is the number of classes. We first describe the experiment settings in Sect. 5.1. And then the results are shown in Sects. 5.2, 5.3 and 5.4, respectively.

5.1 Experiment Preparation

For each dataset, we extract different features according to the content of these datasets. Then the KPCA and LPKPCA are performed on these features. We use cross validation to determine the hyperparameters of k and \(\lambda \) in Eqs. (13) and (15) respectively. When building the similarity matrix \(\mathbf {W}\) and Laplacian Matrix \(\mathbf {L}\), Euclidean Distance is used in the original space to measure the neighborhood relationship. As for classifiers, Liblinear [26] is adopted, which is a SVM library for large linear classification and can deal with multi-class classification. The parameter of Liblinear is set as follows: \(s=0\), \(c=10\), \(e=0.01\). The RBF Kernel is adopted to compute the kernel matrix in equation (9) with sigma values fit for different datasets. In the experiment, the performance comparison is conducted by evaluation on different number of principle vectors.

5.2 Results on ORL Face Database

There are ten different gray images of each of 40 distinct subjects in the ORL face dataset [23]. These images vary in the different conditions of the lighting, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses) [23]. Some images in ORL face database are illustrated in Fig. 1. All the images are taken against a dark background and the subjects in an upright, frontal position. The size of each image is \(92\times 112\) pixels. Following previous work [27], a gray image is converted to a vector as feature which concatenates all the pixels of the image. We split the dataset with equal number of images as training and testing set, respectively. The comparison performance on ORL dataset with different number of principal vectors can be seen in Fig. 2, where the hyperparameters k and \(\lambda \) are set as 8 and 0.05 respectively by cross-validation. We also show the performance comparison between LPKPCA and KPCA with different \(\lambda \) values in Tables 1 and 2. We can see that the novel LPKPCA outperforms KPCA significantly.

Fig. 1.
figure 1

Some images of ORL face database [23].

Table 1. Classification accuracy comparison on ORL Face Database with number of principal vector is 60 and \(k=5\)
Table 2. Classification accuracy comparison on ORL Face Database with number of principal vector is 60 and \(\lambda =0.05\)
Fig. 2.
figure 2

Classification accuracy on ORL Face Database with different number of principal vectors.

5.3 Results on Yale Face Database B

Fig. 3.
figure 3

Some images in Yale Face Database B [24].

There are 5850 gray face images in Yale Face Database B with 10 subjects. Each subject is seen under 576 viewing conditions (9 poses and 64 illuminations). Moreover, there is one image with ambient illumination (i.e., background) for each pose of the subject. Thus the total number of images for each subject is 585. The images of the 10 individuals are illustrated in Fig. 3 [24]. The size of each image is \(640 \times 480\) and we rescale the images to \(40 \times 30\). Same as the experiment on ORL, the images are converted to vectors of 1200 dimension by concatenating the pixels. In experiment, the dataset is divided with two parts with equal number which is used as training and testing set, respectively. By cross-validation, the hyperparameters are set as 23 and 0.046 for k and \(\lambda \) respectively. The comparison with different number of principal vectors are shown in Fig. 4 and with different \(\lambda \) values in Tables 3 and 4. We can see that the LPKPCA outperforms KPCA significantly.

Table 3. Classification accuracy comparison on Yale Face Database B with number of principal vector is 60 and \(k=23\)
Table 4. Classification accuracy comparison on Yale Face Database B with number of principal vector is 60 and \(\lambda =0.046\)
Fig. 4.
figure 4

Classification accuracy on Yale Face Database B with different number of principal vectors.

5.4 Results on Scene 15 Database

There are 15 kinds of scene image in Scene15 dataset such as store, office, highway, etc. The number of images ranges from 200 to 400 and there are 4485 images in total. Some images in Scene 15 database are illustrated in Fig. 5. The dataset is challenging compared to the above ORL and Yale-B dataset because the intra class variance is large. To extract the features to express the holistic content of scene image, GIST descriptor [28] is adopted on all the images. Following the common practice, 100 images are selected randomly per category for training and the remaining ones are treated as test set. The accuracy is reported with different number of principal vectors in Fig. 6. k and \(\lambda \) are set as 8 and 0.038, respectively, by cross-validation. We also show the performance comparison with different \(\lambda \) in Tables 5 and 6. We can see that the performance of LPKPCA is also better than KPCA.

Fig. 5.
figure 5

Some illustrations of Scene 15 database [25].

Table 5. Classification accuracy comparison on Scene 15 Database with number of principal vector is 60 and \(k=8\)
Table 6. Classification accuracy comparison on Scene 15 Database with number of principal vector is 60 and \(\lambda =0.038\)
Fig. 6.
figure 6

Classification accuracy on Scene 15 Database with different number of principal vectors.

6 Conclusion

In this paper, a novel kernel PCA approach, which is called Locality Preserving Kernel PCA (LPKPCA), is proposed to simultaneously reduce the reconstruction error in the projected feature space and preserve the neighborhood relationship in the original space. The formulation of LPKPCA is given and experimental results show that LPKPCA achieves better performance on ORL Face Database, Yale Face Database B and Scene 15. The gain in performance results from taking into consideration the intrinsic geometry structure of the data. In the future, we intend to combine the sparseness and locality constraint together to seek for better methods for dimensionality reduction in image understanding and pattern recognition.