1 Introduction

Computer vision (CV) has become a hot research direction in recent years because of the advance of both related theories and computer hardware. There are many research areas in CV, such as the biometrics technology including fingerprint recognition, palm print recognition, face recognition etc. [3, 9, 24]. Among them, face recognition has received much attention in both academia and industry [2, 15]. As we know, many external factors affect the actual recognition rate [5, 16]. The eyeglass is a common problem in face recognition [8]. Suppose the user wears a pair of eyeglasses when he registers in a face recognition system, he may fail to pass the system when he doesn’t wear eyeglasses in the recognition stage. In order to reduce the negative impact caused by eyeglasses, some algorithms have been proposed in the past.

Extracting more robust facial features is one of the most effective approaches to reduce the negative influence of eyeglasses. Martinez [11] proposed to extract the local facial features to solve the occlusion problem of face recognition. He divided a face into k blocks and calculated the feature for each block. After comparing the training set and testing set, he obtained the probability of occlusion for each block. These probability values were used to adjust the weight of each block when they use Mahalanobis distances in the recognition stage. Zhang et al. [23] proposed a Local Gabor Binary Pattern Histogram Sequence (LGBPHS) method to address the occlusion problems of face recognition. They applied Gabor filters and local binary pattern (LBP) operators [1, 13, 14] to a face image to extract the local facial features for face recognition. Yi et al. [19] performed the sparse representation based classification method on the extracted local features for face recognition. Liu et.al. [10] proposed a novel ununiformed division strategy based on the LGBPHS method [23]. Experimental results in those papers all prove the effectiveness of the local features for eyeglass-face recognition.

In this paper, we try to solve the eyeglass-face recognition problem by extracting robust facial features [4, 21]. The new feature extraction method can show outstanding performance in resisting the trouble caused by eyeglasses in a face. Firstly, we normalize the face image and extract its local facial features. We use non-uniform division strategy proposed in [10] to segment a face into several non-overlapping blocks with different sizes. For each block, Gabor filters with different scales and orientations are utilized to obtain multiple Gabor Magnitude Pictures (GMPs) [18, 23]. Then we compute the histogram for each GMP and concatenate these histograms as the local facial features for each facial block. In this way, we can obtain the local facial features of each face image by integrating all histograms of these blocks. Secondly, we use the 2D-DFT method to extract the global facial features. After transforming the face image into the frequency domain via the 2D-DFT transformation, we can obtain the real component and imaginary component of a face image. In this work, we only exploit the low frequency coefficients as the global facial features since they are the intrinsic global image information [20]. Finally, the extracted local and global facial features are combined via an adaptive weighted fusion approach for face recognition.

The remainder of this paper is organized as follows: Sect. 2 introduce a brief review of the local facial feature extraction method. The details of our proposed method are described in Sect. 3. Section 4 presents the experimental evaluations. The conclusion and discussion of this paper are offered in Sect. 5.

2 A Brief Review of Local Facial Feature Extraction Method

2.1 Facial Feature Extraction Using the LGBPHS Method

The eyeglasses can be viewed as occlusion of the face. Many excellent algorithms have been proposed and achieved a good performance in solving occlusion problems of face recognition. LGBPHS is one of the representative methods. It simultaneously combines the advantages of Gabor filters and LBP operators [22]. Gabor filters have powerful ability in obtaining robust and discriminative local features. LBP is a typical visual descriptor and has been widely used in computer vision owing to its effectiveness in extracting the texture information of an image. The LGBPHS method uses Gabor filters to perform convolution operation on a normalized face image and gets plenty of GMPs in the first step. Then it uses the LBP operator to extract the LBP feature maps base on the obtained GMPs. In the second step, it divides these LBP feature maps uniformly and obtains the statistical histogram information of all blocks. Finally, it concatenates these feature histograms of all blocks into a feature histogram sequence as the final extracted local features. Experimental results show that this method is efficient for general occlusion problems.

2.2 Facial Feature Extraction Using the Ununiformed Division Strategy

Although the LGBPHS method shows good performance in general occlusion face recognition problems, it was proved to be inefficient in eyeglass-face recognition. As we know, the eyeglass-face problem is different from other general occlusion problems since eyeglasses always exists around our eyes. Human eyes are very important features in face recognition. The LGBPHS method uses a uniform way to divide a face into several blocks of the same size. This may break the integrity of eyes when we extract the facial features. In order to maintain the integrity of these facial key-points, an improved non-uniform partition strategy called Ununiformed Local Gabor Binary Pattern Histogram Sequence (ULGBPHS) was proposed in [10]. This method can be viewed as extension of the LGBPHS method which uses a non-uniform partition strategy to extract more robust local features. The main idea of this method comes from the reality that the facial keypoints in the face should keep their own completeness when we divide the face feature map. Therefore, this method proposes to partition a normalized face image into different blocks non-uniformly. Then Gabor filters are applied to each block with different sizes to obtain their GMPs. Finally, it performs LBP on each group of GMPs to obtain the LBP feature maps. It gets the histograms of the local LBP feature maps for each group and concatenated them into a final feature histogram sequence by using a weighted strategy as the final features for face recognition [7, 17]. Experimental results prove that the ULGBPHS method is effective to improve the accuracy in this problem. Although the non-uniform strategy is effective in handling eyeglass-face problems in comparison with the LGBPHS method, it doesn’ t always work well since it still focuses on extraction of facial local features.

3 The Proposed Feature Fusion Method

3.1 Method Analysis

Actually each face has its own overall appearance. Rather than focusing on local feature, we care about both local and global facial features in the eyeglass-face problems. In other words, we should take overall appearance into our consideration when extracting facial features. Usually humans can recognize a person correctly no matter whether he/she wears eyeglasses or not. This is because humans can recognize a face image as a whole at first glance and confirm their appearance by recognizing the local facial keypoints. In this paper, we use the combination of local and global facial features instead of the local parts only. Specially we apply 2D-DFT method to extract the global facial feature of face image with and without eyeglasses. As a matter of fact, we can get two similar face fuzzy contours by using low frequency coefficients to reconstruct face images with and without eyeglasses. This proved the correctness of our idea.

3.2 The Procedure of the Proposed Method

Firstly, we use the ULGBPHS method to extract local facial features. We use Gabor filters of different scales and orientations to perform convolution with the facial blocks. The convolution can be expressed by the following formulation:

$$\begin{aligned} G_{\mu ,\nu }(x,y)=f(x,y)*\psi _{\mu ,\nu } \end{aligned}$$
(1)

where f(xy) represents the pixel value of segmented face image block f and operator \(*\) denotes the convolution operation. We call the \(G_{\mu ,\nu }(x,y)\) GMP after the convolution operation of face image block and Gabor filters. \(\mu \) and \(\nu \) represent the scales and orientations respectively. We use these GMPs in the next step to calculate their LBP feature maps by:

$$\begin{aligned} LGBP_{\mu ,\nu }(x,y)=\sum _{p=0}^{7}S(G_{\mu ,\nu }(x_{p},y_{p})-G_{\mu ,\nu }(x,y))2^{p} \end{aligned}$$
(2)

where \(LGBP_{\mu ,\nu }(x,y)\) denotes each feature map after using LBP operator in each obtained GMP. (xy), \((x_{p},y_{p})\) represent each pixel in a GMP and its neighbor pixels in this GMP respectively. S represents the binary pattern operation and it can be represented by the following equation:

$$\begin{aligned} S(x,y) = {\left\{ \begin{array}{ll} 1, \quad if \quad G_{\mu ,\nu }(x,y)>G_{\mu ,\nu }(x_{p},y_{p})\\ 0, \quad if \quad G_{\mu ,\nu }(x,y)\le G_{\mu ,\nu }(x_{p},y_{p})\\ \end{array}\right. } \end{aligned}$$
(3)

The histogram of a feature map block for a face image can be represented by histogram ranging in \([0,...,L-1]\) in the following format:

$$\begin{aligned} h=\sum _{x,y}binary\left\{ LGBP_{split}(x,y)=i,i=0,1,...,L-1\right\} \end{aligned}$$
(4)

where h denotes the histogram for a feature map of each GMP. i refers to the gray level in this feature map and \(LGBP_{split}(x,y)\) refers to the Local Gabor Binary Pattern (LGBP) feature map for each corresponding segmented face image block. The calculating procedure can be expressed by:

$$\begin{aligned} binary(A)={\left\{ \begin{array}{ll} 1, \text { if } A \,\, is \,\,true\\ 0, \text { if } A\,\, is\,\, false \end{array}\right. } \end{aligned}$$
(5)

We assume that we divide a face into m blocks, therefore the \(r-th\) block histogram is:

$$\begin{aligned} H_{\mu ,\nu ,r}=(h_{\mu ,\nu ,r,0} ,... ,h_{\mu ,\nu ,r,L-1}) \end{aligned}$$
(6)

In this paper, we set \(\mu \) equals to 5 and \(\nu \) equals to 8 respectively. Therefore, we can obtain the local facial features by the following equation:

$$\begin{aligned} H_{local}=(H_{0,0,0},...,H_{0,0,m-1}, H_{0,1,1},...,H_{0,1,m-1},...,H_{7,4,m-1}) \end{aligned}$$
(7)

Secondly, we use 2D-DFT to extract global facial features. We can clearly find that the current algorithms only focus on extraction of the local facial feature. Usually we can recognize a person approximately in the first sight whether he wears eye-glasses or not. We just know the faces from the overall prospective. Therefore, we proposed our method by combining global and local facial features to recognize a person. For local facial features, we use ULGBP method. For global facial features, we use DFT method. This process of extracting global facial features can be shown in the following equation:

$$\begin{aligned} F(\mu ^{'},\nu ^{'})=\frac{1}{MN}\sum _{x=0}^{M-1}\sum _{y=0}^{N-1}g(x,y)e^{-j2\pi (\frac{\mu ^{'}x}{M}+\frac{\nu ^{'}y}{N})} \end{aligned}$$
(8)

where g represents a \(M\times N\) size face image, \(\mu ^{'}\) and \(\nu ^{'}\) refers to frequency variables. The output of the above formulation can be shown in the next equation:

$$\begin{aligned} F(\mu ^{'},\nu ^{'})=R(\mu ^{'},\nu ^{'})+jI(\mu ^{'},\nu ^{'}) \end{aligned}$$
(9)

\(R(\mu ^{'},\nu ^{'})\) and \(I(\mu ^{'},\nu ^{'}\) refers to the real part and imaginary part of \(F(\mu ^{'},\nu ^{'})\). Via the DFT operation, each face can be converted into a real component and imaginary component in frequency domain. And we extract the low frequency part for global information of a face (including real part and imaginary part.). We use \(H_{global}=(H_{R},H_{I})\) to extract global facial information. \(H_{R}\) and \(H_{I}\) are feature vectors which can be calculated by:

$$\begin{aligned} H_{R}=\sum _{x,y}binary\left\{ g_{R}(x,y)=i\right\} ,i=0,1,...,L-1 \end{aligned}$$
(10)
$$\begin{aligned} H_{I}=\sum _{x,y}binary\left\{ g_{I}(x,y)=i\right\} ,i=0,1,...,L-1 \end{aligned}$$
(11)

\(g_{R}(x,y)\) and \(g_{I}(x,y)\) represent the real and imaginary magnitude pictures of a face image calculated by \(R(\mu ^{'},\nu ^{'})\) and \(I(\mu ^{'},\nu ^{'})\) respectively. Finally, we use a weighted fusion approach to combine the two different kinds of facial features. We use \({D_{local}}_{c}^{k}\) and \({D_{global}}_{c}^{k}\) represent the Euclidean distance between the testing sample and training samples for local and global facial features respectively, where c refers to face classes and k refers to the training number of the face images in each class. We assume \(C=1,2,...C\), \(k=1,2,...K\) and the final distance between the testing sample and the training samples can be calculated by the following equation:

$$\begin{aligned} D_{c}^{k}=\omega _{local}*D_{local}^{k}+\omega _{global}*D_{global}^{k} \end{aligned}$$
(12)

where \(\omega _{local}\) and \(\omega _{global}\) refer to the distance weight coefficients and \(\omega _{local}+\omega _{global}=1\). \(D_{c}^{k}\) refers to score level fusion of the distances calculated by two different feature vectors. The final classification result can be represented by:

$$\begin{aligned} t= \mathop {arg}\limits _{c}minD_{c}^{k} \end{aligned}$$
(13)

And then the testing sample is assigned to the \(t-th\) class, \(t\in \left\{ 1,2,...,C \right\} \). The Fig. 1 shows the whole procedure of feature extraction.

Fig. 1.
figure 1

Procedure of feature extraction using our proposed method

4 Experimental Evaluations

The datasets in our experiments come from two sources. Firstly, we select some typical face datasets and pick out the eyeglasses faces and non-eyeglasses faces from these original datasets. We choose the GT [12] and CMU_PIE [6] datasets for our experiments. In order to reduce the negative influence caused by some irrelevant factors, such as illumination, pose, facial expressions, we should keep these factors unchanged in the face images of each individual since they might affect the final recognition rate. We discard these unqualified face images of the original datasets before we do the experiments. After that we rename theses remaining images in order to put images with eyeglasses in the beginning half part while these without eyeglasses are set in the ending half part. Since the individuals in GT dataset do not always wear eyeglasses, we select some of individuals which contain samples with and without eyeglasses. The GT dataset we selected consists of a total 330 images of 22 individuals. Due to the different poses in the CMU_PIE dataset, we divide them into five sub-datasets, which named CMU_PIE_pose05, CMU_PIE_pose07, CMU_PIE_pose09, CMU_PIE_pose27 and CMU_PIE_pose29. The CMU_PIE_pose05 consists of a total 1,120 images of 28 individuals. And 540 face images of 27 individuals from the CMU_PIE_pose07 are used for the experiments. 25 peoples with 20 images per person from the CMU_PIE_pose09 are used for our experiments. The CMU_PIE_pose27 we use contains a total 1,120 images of 28 individuals. In the CMU_PIE_pose29, we use a total 540 images of 27 individuals. Secondly, we built a face dataset in real scene which is named BCC_Lab_Face. Our dataset consists of 50 peoples with 40 images per person. For each person, we collect four groups of face images. Two groups of face images are with eyeglasses while the other two are not. In the procedure of collection, we control the range of illumination, facial pose and facial expressions. Besides, we prepare several pairs of eyeglasses and each of them has different size and color. For a volunteer, if he wears eyeglasses, we collect the first group of face images and register another group of the face images by randomly selecting a pair of eyeglasses from our given eyeglasses. We also collect the other two groups of face images without eyeglasses which are controlled in the same outside scene. Figures 2, 3 and 4 are some samples from our experimental datasets. Actually, we carry out our experiments in two cases. In the first case, we take the first several face images with eyeglasses as training samples, and the rest of them are treated as testing samples. In the second case, we take the last several face images without eyeglasses as training samples whereas the rest of face images are set as test-ing samples. In this way, we can validate that the features extracted by using our proposed method is robust for the eyeglass-face problem. In other words, we use a kind of facial features which can resist eyeglasses problems in face recognition. The details in setting training and testing sets will be discussed in Table 1. According to the setting in Table 1, we conduct two kinds of comparative experiments based on two different experimental cases. We compare our proposed method with other two previous representative methods called LGBPHS and ULGBPHS to show the superiority of our method. Moreover, we also use single features involved in our method as facial features to make comparison with our proposed method. Here, we use DFT, Gabor and LBP features as the single facial feature in comparative experiments. Figure 5 shows our experimental results on different datasets based on two different cases respectively. The different ID numbers in X axis show different datasets in our experiments and they represent CMU_PIE_pose05, CMU_PIE_pose07, CMU_PIE_pose09, CMU_PIE_pose27, CMU_PIE_pose29, GT, BCC_Lab_Face correspondingly. The Y axis shows different recognition rates of different methods. Since we mainly focus on feature extraction, we use the Euclidean distance as a common measure of standards in final recognition stage for these different methods. In this way, we can compare these different methods with our proposed methods. Figure 5 indicates that our method is the best one among these different methods in both two different testing cases. The light blue line with diamond shape shows the accuracy of our method on different datasets. We can clearly find that our method has a higher recognition rate on different datasets and it also shows stronger stability on different testing datasets.

Fig. 2.
figure 2

One face sample from CMU_PIE selected dataset

Fig. 3.
figure 3

Two face samples from GT selected dataset

Fig. 4.
figure 4

Two face samples from BCC_Lab_Face

Table 1. Training and testing sets division.
Fig. 5.
figure 5

Experimental results comparison in two cases

5 Conclusion and Discussion

This paper proposed a novel method to solve the challenge problem of recognition of eyeglass-faces. Compared with other methods, the proposed method simultaneously takes into account the local and global facial features. In other words, the proposed method can extract more discriminant features from the face, which enable it to obtain the best performance. Experimental results on the GT, CMU_PIE, and our collected BCC_Lab_Face datasets powerfully prove the effectives of the proposed method.

In the future, we will try to employ better classification strategies in recognition stage to improve the overall performance of the proposed method in eyeglass-face recognition problem.