Keywords

1 Introduction

Human biometric recognition has been researched extensively in recent years, due to its great reliability and safety. Many occasions in our daily life require biometric recognition to confirm the individual’s identity based on their physiological and behavioral characteristics [1]. As a new biometric technology, palm vein has considerable potential in terms of robustness, uniqueness, and high authentication accuracy. In addition, since the palm veins are only present in the living human body and exist inside the skin, the intruders are difficult to read, copy and forge [2]. Therefore, palm vein recognition has great prospects for human identification.

Nowadays, most traditional methods for palm vein recognition is based on the palm vein skeleton structure. Lajevardi et al. [3] used the Maximum Curvature Algorithm (MCA) to extract the skeleton of veins and performed matching by Biometric Graph Matching (BGM) algorithm. Chen et al. [4] used Gaussian Matched Filter (GMF) to extract the blood vessel and then Iterative Closest Point (ICP) algorithm was exerted on matching. Some other methods also focus on structural features to identify and classify [5,6,7]. In that these methods mentioned above rely too much on manually designed features, the generalization ability and recognition accuracy is not quite good, which means there are still some difficulties in palm vein recognition for practical applications.

Over the past few years, deep learning, especially Convolutional Neural Network (CNN), attracts more and more researchers’ attention due to its powerful learning ability, parallel processing capacity, and strong capability for feature extraction [8], especially on computer vision and multimedia tasks. Recently, CNN with hashing method becomes much more prominent and has successfully been applied in the field of biometrics such as face and palmprint recognition [9]. While, palm vein with hash coding has not been studied yet. Accordingly, in this paper we propose Deep Hashing Palm vein Network (DHPN) based on previous works, which is an end-to-end neural network for palm vein recognition tasks.

In our work, modified CNN-F architecture is employed on DHPN, which can automatically obtain a 128-bit binary code of each palm vein image to conduct matching and recognition. The framework of the proposed DHPN method is illustrated in Fig. 1. First, resized palm vein images are sent into CNN to extract image features. Then through fully connected networks, a fixed-length palm vein code is obtained by using tanh function and sgn function. In neural network, the loss function is designed to generate similar codes for the image samples of the same person, while the codes of different person vary significantly. In matching part, Hamming distance of fixed-length codes of different pictures is calculated as their similarity. If the distance is smaller than the threshold, we can give a conclusion that these images are from the same person. Our experiment results show that DHPN can reach a remarkable EER of 0.0222% in PolyU database with 128-bit and 50% training test rate. Several comparative experiments are also conducted to discuss the leverage of network structure, code bits, training test ratio and databases. The best performance of DHPN can reach the lowest EER = 0% with 256-bit and 50% training test rate, which is better than the other state-of-art methods.

Fig. 1.
figure 1

The framework of the proposed DHPN method

The contributions of this paper can be summarized as follows.

  1. 1.

    To our best knowledge, we firstly use end-to-end CNN with hashing code method in the palm vein recognition successfully.

  2. 2.

    The proposed DHPN can extract the image features using a fixed-length binary code and the identification result can reach a lower EER than the other state-of-art algorithms.

  3. 3.

    Abundant comparative experiments are conducted in palm vein recognition to validate the comprehensive performance of our DHPN method.

The paper consists of 5 sections. Section 2 introduces the related works of palm vein recognition. Section 3 mainly describes the proposed DHPN, including hashing code method, network structure and the definition of the loss function. Many detailed comparative experiment results are presented in Sect. 4. Section 5 gives the conclusion.

2 Related Work

As a promising new biometrics, palm vein recognition gained comprehensive research interests in a recent decade. To get a high recognition performance, feature extraction is one of the most crucial processes. Traditional algorithms of palm vein recognition used physical patterns including minutiae points, ridges and texture to extract features for matching. For instance, multi-spectral adaptive method [10], 3D ultrasound method [11] and adaptive contrast enhancement method [12] are applied for improving image quality. Ma et al. [13] proposed a palm vein recognition scheme based on an adaptive 2D Gabor filter to optimize parameter selection. Yazdani et al. [14] presented a new method based on the estimate of wavelet coefficient with autoregressive model to extract texture feature for verification. Some novel methods were also presented to overcome the drawbacks, including image rotation, shadows, obscure and deformation [15, 16]. However, as the database grows larger, traditional techniques of palm vein recognition are prone to have higher time complexity, which has an adverse effect on the practical applications.

Recently, deep learning, as one of the most promising technologies, overturned traditional cognition and has also been introduced into the field of palm vein recognition. Fronitasari et al. [17] presented a palm vein extraction method which is a modified version of the Local Binary Pattern (LBP) and combined it with Probabilistic Neural Network (PNN) for matching. In addition, supervised deep hashing technology has attracted more attention on large-scale image retrieval due to its higher accuracy, stability and less time complexity in the last several years. Lu et al. [18] proposed a new Deep Hashing approach for scalable image search by a deep neural network to exploit linear and non-linear relationships. Liu et al. [19] proposed a Deep Supervised Hashing (DSH) scheme for fast image retrieval combined with a CNN architecture. The superior performance of deep hashing approaches for image retrieval inspires researchers to enlarge the applications of deep hashing from image searching to biometrics.

3 The Proposed DHPN Method

In computer vision, convolutional neural network is one of the most effective tools as deep learning has developed rapidly in recent years. However, with the expansion of Internet, the amount of image data has boosted significantly. In order to settle the issue of the storage space and retrieval time for pictures, hashing, as the representative method of nearest neighbor search, has received extensive attention and hash coding has been successfully applied to convolutional neural networks [20,21,22]. However, in biometrics, especially in palm vein recognition, CNN with hashing coding method has not been reported yet.

Thus, we propose a Deep Hashing Palm vein Network (DHPN), which can automatically obtain the codes of palm vein images to achieve matching. Being different from the prior image coding method [9], DHPN is an end-to-end network, which reduces artificial-designed features and can encode palm vein images directly.

3.1 Hashing Code Method

The intention of the hashing algorithm is to represent the sample as a fixed-length binary code, such as 0/1 or −1/1, so that the original information-rich sample is compressed into a short code string and thus, similar samples have similar codes and vice versa. For example, the Hamming distance of the hashing codes of two similar samples should be as small as possible, while the distance of dissimilar samples should be quite large. By measuring the difference between two hash codes of two images, it can be judged whether they belong to the same category. In practice, the speed of calculation can be increased by XOR operations.

Traditional hashing methods require manually designed features to further obtain binary encoding. In deep learning, the convolutional neural network can effectively extract the representative features of the image. Therefore, just inputting the palm vein image to the training network and quantizing the network output, we can directly obtain the binary code of the corresponding image. This end-to-end training method eliminates manual design steps, reduces feature extraction time, and significantly improves the accuracy of palm vein recognition.

3.2 Structure of DHPN

To find the most suitable neural network for palm vein recognition, we attempted the network structure in two different ways. As we known, fine-tuning trick can be used to obtain image features for small dataset tasks to learn binary encoding [23]. Firstly, we employed the first 5 layers of pre-trained VGG-16 [24] on the Image-Net dataset as the convolution feature. Hence, we inserted 3 fully connected layers and adjusted the fully connected parameters when training. The last layer is comprised of 128 neurons. To obtain a binary code, tanh is selected as the output layer activation function, which can output the value between −1 and 1. Then we used the sign function sgn to quantify the continuous 128-bit code to get a discrete binary code as formula (1).

$$ sgn\left( x \right) = \left\{ {\begin{array}{*{20}l} { - 1, x < 0} \hfill \\ {1, x \ge 0} \hfill \\ \end{array} } \right. $$
(1)

However, in actual experiments, we find that due to too many network parameters of VGG-16, there is often an overfitting phenomenon in the palm vein test set with the inferior matching accuracy and long training time. Therefore, considering above drawbacks and small-data character for palm vein recognition, we chose a lighter-weight network as CNN-F.

The structure of CNN-F similar to AlexNet consists of 5 convolution layers and 3 fully connected layers [25]. To achieve higher accuracy and better coding performance in palm vein identification, we proposed DHPN based on the CNN-F structure. The detailed configuration is shown in Table 1. The parameters indicate the convolution stride (“st.”), spatial padding (“pad”), Local Response Normalization (LRN), Batch Normalization (BN) and the max-pooling down sampling factor (“pool”).

Table 1. The detailed configuration of DHPN.

3.3 Loss Function

It is important to design an appropriate loss function for DHPN. The literature [26] points out that controlling the quantization error and cross-entropy loss can effectively promote the network performance. Hence, the presented loss function for palm vein recognition task is based on two components, hash loss and quantization loss.

Hash Loss.

In order to achieve the experimental outcome that similar pictures have similar codes, we designed a hash loss based on the pairwise similarity. Define pairwise similarity matrix as PN×N and Pij represents the relevance of the ith and jth pictures. When Pij equals to 1, it means two pictures belong to the same class; otherwise Pij is 0, indicating that the two pictures belong to different classes. So the hashing loss of the two pictures can be expressed as follows.

$$ J\left( {U_{i} , U_{j} ,P_{ij} } \right) = \frac{1}{2}P_{ij} D_{h} \left( {U_{i} ,U_{j} } \right) + \frac{1}{2}\left( {1 - P_{ij} } \right)\hbox{max} \left( {M - D_{h} \left( {U_{i} ,U_{j} } \right),0} \right) $$
(2)

Ui and Uj denote the DHPN output of the ith image and the jth image respectively. Dh(\( U_{i} \)\( U_{j} \)) represents the Hamming distance of two encoded outputs. In formula (2), M is the distance threshold, that is, when ith image and the jth image are not from the same category and Dh(Ui\( U_{j} \)) is greater than M, the Hamming distance between two images reaches quite big and no further expansion is needed. In the experiment, M is set to 180.

If the training set size of vein images is N, the total hash loss can be declared as

$$ J_{H} = \sum\nolimits_{i = 1}^{N} {\sum\nolimits_{j = 1}^{N} {J\left( {U_{i} , U_{j} ,P_{ij} } \right)} } $$
(3)

Quantization Loss.

In the output of the neural network, if the last layer’s output is randomly distributed, through tanh and sgn function for binarization, it will inevitably lead to large quantization error [26]. In order to reduce the quantization error, we define the following quantization loss JQ, making each output closer to 1 or −1.

$$ J_{Q} = \sum\nolimits_{i = 1}^{N} {\frac{1}{2}\left( { \left\| {1 - \left| {U_{i} } \right| } \right\|_{2}} \right)} $$
(4)

Where |Ui| denotes the absolute value of Ui and ||·||2 stands for L2-norm of vector.

Thus, we can obtain the following optimization formula, where α indicates the scale factor.

$$ {\text{min }}J = \alpha J_{H} + J_{Q} $$
(5)

4 Experiments and Results

In this section, we briefly introduce the palm vein databases and relevant comparison experiments. The databases include the Hong Kong Polytechnic University (PolyU) public palm vein database and our self-built database as Xi’an Jiaotong University (XJTU) palm vein database. In comparative experiment, we also adjusted the network structures, training ratios, code bits and different databases to discuss how these variables affect the performance of DHPN.

4.1 Experimental Database and Setting

At present, the PolyU palm vein database is a representative database for palm vein recognition [27], which consists of 6000 near-infrared palm vein images with 128 × 128 pixel from 500 individuals. In the PolyU database, the palm vein images of each hand are collected in two different time periods, each time they collected 6 images and 12 palm vein images per person in total. The samples are shown in Fig. 2.

Fig. 2.
figure 2

Typical palm vein ROI images of three people in PolyU database

In the experiment, we used the DHPN structure mentioned in Sect. 3.2 with exponential decaying learning rate. The parameter M was set to 180 and the balance factor α was set to 2. During the training, we chose training ratio to be 50%, which means 3000 images were used as the training set and the other 3000 images were used as the testing set. By optimizing the loss function J, we obtained the network parameters after training 8000 steps. Then, by inputting all 6000 original palm images into DHPN, we can get their 128-bit binary codes. Finally, the Hamming distance of binary codes between the genuine matches and imposter matches is calculated and a similarity threshold is then being given for judging the recognition result. By changing the threshold, we can get the Receiver Operator Characteristic (ROC) Curve shown in Fig. 3. As we can see, the EER = 0.0222%.

Fig. 3.
figure 3

ROC curve and distribution of genuine and imposter matches (Color figure online)

In PolyU dataset, we used the test images to match all the training pictures. By 50% training test ratio, genuine matches are total 18000 groups and imposter matches are 8,982,000 groups. The distribution of Hamming distances of all matches is shown in Fig. 3. Red curve and blue curve represent the distribution of genuine matches and imposter matches respectively. As what can be seen from the figure, genuine matches and imposter matches can be clearly distinguished by a reasonable threshold.

4.2 Experimental Results

Comparison of Network Structure.

In Sect. 3.2, we have explained that the network structure has a great impact on the accuracy of matching. In this section, we tried three different network structures on PolyU database respectively, and evaluated their performance by measuring EER as shown in Table 2.

Table 2. Comparison of different networks

The results show that the pre-trained VGG-16 with fine-tuning trick cannot achieve good accuracy, which does not demonstrate the advantage of deep learning. Then we experimented the matching with the original CNN-F network and modified CNN-F network called DHPN, and results prove that our DHPN perform outstandingly on PolyU database. It can be seen that our proposed DHPN network achieves the lowest EER in different networks.

Training Ratio and Code Bits.

Next, we naturally considered that the number of encoding bits and training ratio would also affect the accuracy of the DHPN. Because each people in PolyU database has 12 images, the training test ratios are set to 3:9, 6:6, 9:3, respectively. The comparative experiment results of different training ratio and code bits are shown in Table 3.

From Table 3, we can observe that the larger training test ratio, the lower the EER, and the best performance of EER can reach 0% in 128 bits. Due to supervised learning of DHPN, the network can learn image features better with larger training dataset, which explains the greater training test ratios have lower EER. At the same time, as the number of encoding bits grow bigger, the more image information will be learned, leading to the lower EER simultaneously.

Table 3. Comparison of different training ratio and code bits

Experiment on Different Databases.

Because of the representative of the PolyU database and the convenience of comparison, the previous experiments were all conducted on PolyU. In order to measure the generalization performance, we also test the proposed DHPN in our database (XJTU), which is our self-built palm vein database containing 600 images of 60 people. The detailed introduction and experiment results of two databases are as follows in Table 4.

Table 4. Comparison of different palm vein database

It can be seen from Table 4 that the collection environment of PolyU database is still an ideal laboratory environment and cannot represent the situation in the real world. Therefore, we established our own database to simulate a more practical environment to acquire palm vein images, but also as a cost, leading to a higher EER compared with PolyU database. While the DHPN also performs a good result of EER = 1.333%, which substantiates the effectiveness of our database and the promising generalization power of DHPN in palm vein recognition.

Contrast with the State-of-Art Methods.

Finally, for palm vein recognition, we compared our DHPN method with the other state-of-art methods, as shown in Table 5.

Table 5. Comparison of the state-of-art methods on PolyU database

It can be seen that the proposed DHPN method achieves the lowest EER compared to the other state-of-art methods. Therefore, we can conclude that the supervised deep learning structure of DHPN leads to a stronger feature learning ability for palm vein recognition, and its hash feature leads to higher recognition accuracy, which makes palm vein recognition have a good advantage of robustness, security and wider application scenarios. Therefore, DHPN can be considered as a effective and promising palm vein recognition method.

5 Conclusion

For the palm vein recognition task, this paper presents an end-to-end deep hashing palm vein network method named as DHPN. The modified CNN-F architecture was used to extract vein features, and a fixed-length binary hash code was obtained by the neural network output with sgn function. By measuring the Hamming distances of two binary codes, we can determine if two input palm vein images are from the same person. The experimental results show that in the PolyU database, our network can reach a significant EER = 0% in best performance. We also did several comparative experiments to discuss the effects of network structure, code bits, training test rates, and databases. In conclusion, DHPN not only has the advantage of strong image feature learning ability, spacious recognition scenario applications and high recognition accuracy, but also this end-to-end deep hashing code method can eliminate manual design steps and reduces feature extraction time. In the future work, we will further study the deep hashing method to improve the accuracy of palm vein recognition, especially on the basis of image retrieval knowledge, and test our algorithm in a larger database.