An effective biometric discretization approach to extract highly discriminative, informative, and privacyprotective binary representation
 2.3k Downloads
 2 Citations
Abstract
Biometric discretization derives a binary string for each user based on an ordered set of biometric features. This representative string ought to be discriminative, informative, and privacy protective when it is employed as a cryptographic key in various security applications upon error correction. However, it is commonly believed that satisfying the first and the second criteria simultaneously is not feasible, and a tradeoff between them is always definite. In this article, we propose an effective fixed bit allocationbased discretization approach which involves discriminative feature extraction, discriminative feature selection, unsupervised quantization (quantization that does not utilize class information), and linearly separable subcode (LSSC)based encoding to fulfill all the ideal properties of a binary representation extracted for cryptographic applications. In addition, we examine a number of discriminative featureselection measures for discretization and identify the proper way of setting an important featureselection parameter. Encouraging experimental results vindicate the feasibility of our approach.
Keywords
biometric discretization quantization feature selection linearly separable subcode encoding1. Introduction
Binary representation of biometrics has been receiving an increased amount of attention and demand in the last decade, ever since biometric security schemes were widely proposed. Security applications such as biometricbased cryptographic key generation schemes [1, 2, 3, 4, 5, 6, 7] and biometric template protection schemes [8, 9, 10, 11, 12, 13] require biometric features to be present in binary form before they can be implemented in practice. However, as security is in concern, these applications require binary biometric representation to be

Discriminative: Binary representation of each user ought to be highly representative and distinctive so that it can be derived as reliably as possible upon every query request of a genuine user and will neither be misrecognized as others nor extractable by any nongenuine user.

Informative: Information or uncertainty contained in the binary representation of each user should be made adequately high. In fact, the use of a huge number of equalprobable binary outputs creates a huge key space which could render an attacker clueless in guessing the correct output during a brute force attack. This is extremely essential in security provision as a malicious impersonation could take place in a straightforward manner if the correct key can be obtained by the adversary with an overwhelming probability. Entropy is a common measure of uncertainty, and it is usually a biometric system specification. By denoting the entropy of a binary representation as L, it can then be related to the N number of outputs with probability p_{ i }for i = {1,...,N} by $L={\sum}_{i=1}^{N}{p}_{i}{log}_{2}{p}_{i}$. If the outputs are equalprobable, then the resultant entropy is maximal, that is, L = log_{2}N. Note that the current encryption standard based on the advanced encryption standard (AES) is specified to be 256bit entropy, signifying that at least 2^{256} possible outputs are required to withstand a brute force attack at the current state of art. With the consistent technology advancement, adversaries will become more and more powerful, resulting from the growing capability of computers. Hence, it is utmost important to derive highly informative binary strings in coping with the rising encryption standard in the future.

Privacyprotective: To avoid devastated consequence upon compromise of the irreplaceable biometric features of every user, the auxiliary information used for bitstring regeneration must not be correlated to the raw or projected features. In the case of system compromise, such noncorrelation of the auxiliary information should be guaranteed to impede any adversarial reverse engineering attempt in obtaining the raw features. Otherwise, it has no difference from storing the biometric features in the clear in the system database.
In general, most biometric discretization can be decomposed into two essential components, which can be alternatively described as a twostage mapping process:

Quantization: The first component can be seen as a continuoustodiscrete mapping process. Given a set of feature elements per user, every onedimensional feature space is initially constructed and segmented into a number of nonoverlapping intervals where each of which is associated to a decimal index.

Encoding: The second component can be regarded as a discretetobinary mapping process, where the resultant index of each dimension is mapped to a unique nbit binary codeword of an encoding scheme. Next, the codeword output of every feature dimension is concatenated to form the final bit string of a user. The discretization performance is finally evaluated in the Hamming domain.
These two components are governed by a static or a dynamic bit allocation algorithm, determining whether the quantity of binary bits allocated to every dimension is fixed or varied, respectively. Besides, if the (genuine or/and imposter) class information is used in determining the cut points (intervals' boundaries) of the nonoverlapping quantization intervals, the discretization is thus known as supervised discretization[1, 3, 16], and otherwise, it is referred to as unsupervised discretization[7, 17, 18, 19].
On the other hand, information about the constructed intervals of each dimension is stored as the helper data during enrolment so as to assist reproducing the same binary string of each genuine user during the verification phase. However, similar to the security and the privacy requirements of the binary representation, it is important that such helper data, upon compromise, should neither leak any helpful information about the output binary string (security concern), nor the biometric feature itself (privacy concern).
1.1 Previous works
Over the last decade, numerous biometric discretization techniques for producing a binary string from a given set of features of each user have been reported. These schemes base upon either a fixedbit allocation principle (assigning a fixed number of bits to each feature dimension) [4, 5, 6, 7, 10, 13, 16, 20] or a dynamicbit allocation principle (assigning a different number of bits to each feature dimension) [1, 3, 17, 18, 19, 21].
Monrose et al. [4, 5], Teoh et al. [6], and Verbitsky et al. [13] partition each feature space into two intervals (labeled by '0' and '1') based on a prefix threshold. Tuyls et al. [12] and Kevenaar et al. [9] have used a similar 1bit discretization technique, but instead of fixing the threshold, the mean of the background probability density function (for modeling interclass variation) is selected as the threshold in each dimension. Further, reliable components are identified based on either the training bit statistics [12] or a reliability (RL) function [9] so that unreliable dimensions can be eliminated from bits' extraction.
Kelkboom et al. have analytically expressed the genuine and imposter bit error probability [22] and subsequently modeled a discretization framework [23] to analytically estimate the genuine and imposter Hamming distance probability mass functions (pmf) of a biometric system. This model is based upon a static 1bit equalprobable discretization under the assumption that both intraclass and interclass variations are Gaussian distributed.
Han et al. [20] proposed a discretization technique to extract a 9bit pin from each user's fingerprint impressions. The discretization derives the first 6 bits from six preidentified reliable/stable minutiae: If a minutia belongs to bifurcation, a bit "0" is assigned; otherwise, if it is a ridge ending, a bit "1" is assigned. The derivation of the last 3 bits is constituted by a singlebit discretization on each of three triangular features. If the biometric password/pin is used directly as a cryptographic key in security applications, it will be too short to survive brute force attacks, as an adversary would only require at most 512 attempts to crack the biometric password.
Hao and Chan [3] and Chang et al. [1] employed a multibit supervised userspecific biometric discretization scheme, each with a different intervalhandling technique. Both schemes initially fix the position of the genuine interval of each dimension dimension around the modeled pdf of the j th user: [μ_{ j } kσ_{ j } , μ_{ j } +kσ_{ j } ] and then construct the remaining intervals based on a constant width of 2kσ_{ j } within every feature space. Here, μ_{ j } and σ_{ j } denote mean and standard deviation (SD) of the user pdf, respectively and k is a free parameter. As for the boundary portions at both ends on each feature space, Hao and Chan unfold every feature space arbitrarily to include all the remaining possible feature values in forming the leftmost and rightmost boundary intervals. Then, all the constructed intervals are labeled with direct binary representation (DBR) encoding elements (i.e. 3_{10} → 011_{2}, 4_{10} → 100_{2}, 5_{10} → 101_{2}). On the other hand, Chang et al. extend each feature space to account for the extra equalwidth intervals to form 2 ^{ n } intervals in accordance to the entire 2 ^{ n } codeword labels from each nbit DBR encoding scheme.
Although both these schemes are able to generate binary strings of arbitrary length, they turn out to be greatly inefficient, since the adhoc interval handling strategies may probably result in considerable leakage of entropy which will jeopardize the security of the users. In particular, the nonfeasible labels of all extra intervals (including the boundary intervals) would allow an adversary to eliminate the corresponding codeword labels from her or his outputguessing range after observing the helper data, or after reliably identifying the "fake" intervals. Apart from this security issue, another critical problem with these two schemes is the potential exposure of the exact location of each genuine user pdf. Based on the knowledge that the user pdf is located at the center of the genuine interval, the constructed intervals thus serve as a clue at which the user pdf could be located to the adversary. As a result, the possible locations of user pdf could be reduced to the amount of quantization intervals in that dimension, thus potentially facilitating malicious privacy violation attempt.
Chen et al. [16] demonstrated a likelihoodratiobased multibit biometric discretization scheme which is likewise to be supervised and user specific. The quantization scheme first constructs the genuine interval to accommodate the likelihood ratio (LR) detected in that dimension and creates the remaining intervals in an equalprobable (EP) manner so that the background probability mass is equally distributed within every interval. The leftmost and rightmost boundary intervals with insufficient background probability mass are wrapped into a single interval that is tagged with a common codeword label from the binary reflected gray code (BRGC)encoding scheme [24] (i.e., 3_{10} → 010_{2}, 4_{10} → 110_{2}, 5_{10} → 111_{2}). This discretization scheme suffers from the same privacy problem as the previous supervised schemes owing to that the genuine interval is constructed based on the userspecific information.
Yip et al. [7] presented an unsupervised, nonuser specific, multibit discretization scheme based on equalwidth intervals' quantization and BRGC encoding. This scheme adopts the entire BRGC code for labeling, and therefore, it is free from the entropy loss problem. Furthermore, since it does not make use of the user pdf to determine the cut points of the quantization intervals, this scheme does not seem to suffer from the aforementioned privacy problem.
Teoh et al. [18, 19] developed a bitallocation approach based on an unsupervised equalwidth quantization with a BRGCencoding scheme to compose a long binary string per user by assigning different number of bits to each feature dimension according to the SD of each estimated user pdf. Particularly, the intention is to assign a larger quantity of binary bits to discriminative dimensions and smaller otherwise. In other words, the larger the SD of a user pdf is detected to be, the lesser the quantity of bits will be assigned to that dimension and vice versa. Nevertheless, the length of the binary string is not decided based on the actual position of the pdf itself in the feature space. Although this scheme is invulnerable to the privacy weakness, such a deciding strategy gives a less accurate bit allocation: A user pdf falling across an interval boundary may result in an undesired intraclass variation in the Hamming domain and thus should not be prioritized for bit extraction. Another concern is that pure SD might not be a promising discriminative measure.
Chen et al. [17] introduced another dynamic bitallocation approach by considering detection rate (DR) (user probability mass captured by the genuine interval) as their bitallocation measure. The scheme, known as DRoptimized bitallocation (DROBA), employs an equalprobable quantization intervals construction with BRGC encoding. Similar to Teoh et al.'s dynamic bit allocation scheme, this scheme assigns more bits to more discriminative feature dimensions and vice versa. Recently, Chen et al. [21] developed a similar dynamic bitallocation algorithm based on optimizing a different bitallocation measure: area under the FRR curve. Given the biterror probability, the scheme allocates bits dynamically to every feature component in a similar way as DROBA except that the analytic area under the FRR curve for Hamming distance evaluation is minimized instead of DR maximization.
1.2 Motivation and contributions
This article focuses on discretization basing upon the fixed bitallocation principle. We extend the study of [25] to tackle the open problem of generating desirable binary strings that are simultaneously highly discriminative, informative, and privacyprotective by means of discretization based on LSSC. Specifically, we adopt a discriminative feature extraction with a further feature selection to extract discriminative feature components; an unsupervised quantization approach to offer promising privacy protection; and an LSSC encoding to achieve large entropy without having to sacrifice the actual classification performance accuracy of the discriminative feature components. Note that the preliminary idea of this article has appeared in the context of global discretization [26] for achieving strong security and privacy protection with high training efficiency.
 a)
We propose a fixed bitallocationbased discretization approach to extract a binary representation which is able to fulfill all the required criteria from each given set of userspecific features.
 b)
Required by our approach, we study empirically various discriminative measures that have been put forward for feature selection and identify the reliable ones among them.
 c)
We identify and analyze factors that influence improvements resulting from the discriminative selection based on the respective measures.
The structure of this article is organized as follows. In the next section, the efficiency of using LSSC over BRGC and DBR for encoding is highlighted. In section 3, detailed descriptions about our approach in generating desirable binary representation will be given and elaborated. In section 4, experimental results justifying the effectiveness of our approach are presented. Finally, concluding remarks are provided in Section 5.
2. The emergence of LSSC
2.1 The securityperformance tradeoff of DBR and BRGC
A collection of n_{ DBR }bit DBRs and n_{ BRGC }bit BRGCs for S = 8 and 16 with [τ] indicating the codeword index.
Direct binary representation (DBR)  Binary reflected gray code (BRGC)  

n _{ DBR } = 3 S = 8  n _{ DBR } = 4 S = 16  n _{ BRGC } = 3 S = 8  n _{ BRGC } = 4 S = 16  
[0]  000  [0]  0000  [8]  1000  [0]  000  [0]  0000  [8]  1100 
[1]  001  [1]  0001  [9]  1001  [1]  001  [1]  0001  [9]  1101 
[2]  010  [2]  0010  [10]  1010  [2]  011  [2]  0011  [10]  1111 
[3]  011  [3]  0011  [11]  1011  [3]  010  [3]  0010  [11]  1110 
[4]  100  [4]  0100  [12]  1100  [4]  110  [4]  0110  [12]  1010 
[5]  101  [5]  0101  [13]  1101  [5]  111  [5]  0111  [13]  1011 
[6]  110  [6]  0110  [14]  1110  [6]  101  [6]  0101  [14]  1001 
[7]  111  [7]  0111  [15]  1111  [7]  100  [7]  0100  [15]  1000 
Conventionally, a tradeoff between discretization performance and entropy length is inevitable when DBR or BRGC is adopted as the encoding scheme. The rationale behind was identified to be the indefinite discretetobinary mapping behavior during the discretization process, since the employment of an encoding scheme in general affects only on how each index of the quantization intervals is mapped to a unique binary codeword. More precisely, one may carefully notice that multiple DBR as well as BRGC codewords share a common Hamming distance with respect to any reference codeword in the code for n_{DBR} and n_{BRGC} ≥ 2, mapping possibly most initially wellseparated imposter feature elements from a genuine feature element in the index space much nearer than it should be in the Hamming space. Taking 4bit DBRbased discretization as an example, the interval labelled with "1000", located 8 intervals away from the reference interval "0000", is eventually mapped to one Hamming distance away in the Hamming space. Worse for BRGC, interval "1000" is located even further (15 intervals away) from interval '0000'. As a result, imposter feature components might be misclassified as genuine in the Hamming domain and eventually, the discretization performance would be greatly impeded by such an imprecise discretetobinary map. In fact, this defective phenomenon gets more critical as the required entropy increases, or as S increases [25].
2.2 LSSC
Linearly separable subcode (LSSC) [25] was put forward to tackle the aforementioned inabilities of DBR and BRGC effectively in fully preserving the separation of feature points in the index domain when the eventual distance evaluation is performed in the Hamming domain. This code particularly utilizes redundancy to augment the separability in the Hamming space for enabling onetoone correspondence between every nonreference codeword and the Hamming distance incurred with respect to every possible reference codeword.
A collection of n_{ LSSC }bit LSSCs for S = 4, 8 and 16 where [τ] denotes the codeword index.
n_{ LSSC }= 3 S= 4  n_{ LSSC }= 7 S= 8  n_{ LSSC }= 15 S= 16  

[0]  000  [0]  0000000  [0]  000000000000000  [8]  000000011111111 
[1]  001  [1]  0000001  [1]  000000000000001  [9]  000000111111111 
[2]  011  [2]  0000011  [2]  000000000000011  [10]  000001111111111 
[3]  111  [3]  0000111  [3]  000000000000111  [11]  000011111111111 
[4]  0001111  [4]  000000000001111  [12]  000111111111111  
[5]  0011111  [5]  000000000011111  [13]  001111111111111  
[6]  0111111  [6]  000000000111111  [14]  011111111111111  
[7]  1111111  [7]  000000001111111  [15]  111111111111111 
The amount of bit disagreement, or equivalently the Hamming distance between any pair of codewords happens to be the same as the corresponding positive index difference. For a 3bit LSSC, as an example, the Hamming distance between codewords "111" and "001" is 2, which appears to be equal to the difference between the codeword index "3" and "1". It is in general not difficult to observe that neighbour codewords tend to have a smaller Hamming distance compared to any distant codewords. Thus, unlike DBR and BRGC, LSSC ensures every distance in the index space being thoroughly preserved in the Hamming space, despite the large bit redundancy a system might need to afford. As reported in [25], increasing the entropy per dimension has a trivial effect on discretization performance through the employment of LSSC, with the condition that the quantity of quantization intervals constructed in each dimension is not too few. Instead, the entropy now becomes a function of the bit redundancy incurred.
3. Desirable bit string generation and the appropriate discriminative measures
 i.
[Feature Extraction]Employ a discriminative feature extractor ℑ(·) (i.e., Fisher’s linear discriminant analysis (FDA) [27], Eigenfeature regularization and extraction (ERE) [28]) to ensure D quality features being extracted from R raw features;
 ii.
[Feature Selection]Select D _{fs}(D _{fs} < D < R) most discriminative feature components from a total of D dimensions according to a discriminative measure χ(·);
 iii.
[Quantization]Adopt an unsupervised equalprobable quantization scheme Q(·) to achieve strong privacy protection; and
 iv.
[Encoding]Employ LSSC for encoding ℰ _{LSSC}(·) to maintain such discriminative performance, while satisfying arbitrary entropy requirement imposed on the resultant binary string.
This approach initially obtains a set of discriminative feature components in steps (i) and (ii); and produces an informative userspecific binary string (with large entropy) while maintaining the prior discriminative performance in steps (iii) and (iv). The privacy protection is offered by unsupervised quantization in step (iii), where the correlation of helper data with the userspecific data is insignificant. This makes our fourstep approach to be capable of producing discriminative, informative, and privacyprotective binary biometric representation.
Among the steps, implementations of (i), (iii), and (iv) are pretty straightforward. The only uncertainty lies in the appropriate discriminative measure and the corresponding parameter D_{fs} in step (ii) for attaining absolute superiority. Note that step (ii) is embedded particularly to supplement the restrictive performance led by employment of unsupervised quantization. Here, we introduce a couple of discriminative measures that can be adopted for discretization and perform a study on the superiority of such measures in the next section.
3.1 Discriminative measures X(·) for feature selection
The discriminativeness of each feature component is closely related to the wellknown Fisher's linear discriminant criterion [27], where the discriminant criterion is defined to be the ratio of betweenclass variance(interclass variation) to withinclass variance (intraclass variation).
Suppose that we have J users enrolled to a biometric system, where each of them is represented by a total of Dordered feature elements ${v}_{ji}^{1},{v}_{ji}^{2},...,{v}_{ji}^{D}$ upon feature extraction from each measurement. In view of potential intraclass variation, the d th feature element of the j th user can be modeled from a set of measurements by a user pdf, denoted by ${f}_{j}^{d}\left(v\right)$ where d ∈ {1, 2,...,D}, j ∈ {1, 2,...,J} and v ∈ feature space ${V}^{d}$. On the other hand, owing to interclass variation, the d th feature element of the measurements of the entire population can be modeled by a background pdf, denoted by f^{ d } (v). Both distributions are assumed to be Gaussian according to the central limit theorem. That is, the d thdimensional background pdf has mean μ^{ d } and SD σ^{ d } while the j th user's d thdimensional user pdf has mean ${\mu}_{j}^{d}$ and variance ${\sigma}_{j}^{d}$.
3.1.1. Likelihood ratio (χ= LR)
The remaining intervals are then constructed equalprobably, that is, with reference to the portion of background distribution captured by the genuine interval. Since different users will have different intervals constructed in each feature dimension, this discretization approach turns out to be user specific.
Therefore, adopting D_{fs} dimensions with maximum LR would be equivalent to selecting D_{fs} feature elements with maximum inter over intraclass variation.
3.1.2. Signaltonoise ratio (χ= SNR)
3.1.3. Reliability (χ = RL)
where erf is the error function. This RL measure would produce a higher value when a feature element has a larger difference between ${\mu}_{j}^{d}$ and μ^{ d } relative to ${\sigma}_{j}^{d}$. As a result, a high RL measurement indicates a high discriminating power of a feature component.
3.1.4. Standard deviation (χ = SD)
In dynamic discretization, the amount of bits allocated to a feature dimension indicates how discriminative the userspecific feature component is detected to be. Usually, a more discriminative feature component is assigned with a larger quantity of bits and vice versa. The pure userspecific SD measure ${\sigma}_{j}^{d}$ signifying intraclass variance, was adopted by Teoh et al. as a bitallocation measure [18, 19] and hence may serve as a potential discriminative measure.
3.1.5. Detection rate (χ = DR)
where ${\delta}_{j}^{d}$ denotes the j th user's DR in the d th dimension and S^{ d } denotes the number of constructed intervals in the d th dimension.
We shall empirically identify discriminative measures that can be reliably employed in the next section.
3.2 Discussions and a summary of our approach
4. Experiments and analysis
4.1. Experiment setup
Two popular face datasets are selected to evaluate the experimental discretization performance in this section:
FERET
This employed dataset is a subset of the FERET face dataset [29], in which the images were collected under varying illumination conditions and face expressions. It contains a total of 1800 images with 12 images for each of 150 users.
FRGC
The adopted dataset is a subset of the FRGC dataset (version 2) [30], containing a total of 2124 images with 12 images for each of the 177 identities. The images were taken under controlled illumination condition.
For both datasets, proper alignment is applied to the images based on standard face landmarks. Owing to possible strong variation in hair style, only the face region is extracted for recognition by cropping the images to the size of 30 × 36 for FERET dataset and 61 × 73 for FRGC dataset. Finally, histogram equalization is applied to the cropped images.
Half of each identity's images are used for training, while the remaining half are used for testing. For measuring the system's false acceptance rate (FAR), each image of the corresponding user is matched against that of every other user according to its corresponding image index, while for the False Rejection Rate (FRR) evaluation, each image is matched against every other images of the same user for every user. In the subsequent experiments, the equal error rate (EER) (error rate where FAR = FRR) is used for comparing the discretization performance among different discretization schemes, since it is a quick and convenient way to compare the performance accuracy of the discretizations. Basically, the performance is considered to be better when the EER is lower.
The experiments can be divided into three parts: The first part identifies the reliable discriminative feature selection measures among those listed in the previous section. The second part examines the performance of our approach and illustrates that replacing LSSC with DBR or BRGCencoding scheme in our approach would achieve a much poorer performance when high entropy is imposed because of the conventional performanceentropy tradeoff of DBR and BRGCencodingbased discretization; The last part scrutinizes and reveals how one could attain reliable parameter estimation, i.e., D_{fs}, in achieving the highest possible discretization performance.
The experiments were carried out based on two different dimensionalityreduction techniques: ERE [28] and FDA [27], and two different datasets: FRGC and FERET. In the first two parts of the experiments, 4453 raw dimensions of FRGC images and 1080 raw dimensions of FERET images were both reduced to D = 100 dimensions. While for the last part, the raw dimensions of images from both datasets were reduced to D = 50 and 100 dimensions for analytic purpose. Note that EP quantization was employed in all parts of experiment.
4.2. Performance assessment
4.2.1. Experiment Part I: Identification of reliable featureselection measures
With this, L = 100, 200, 300 and 400 correspond to l_{ fs } = n_{ fs } = 2, 4, 6 and 8 respectively, for D_{fs} = 50. This implies that the number of segmentation in each selected feature dimension is now larger than the usual case by a factor of ${2}^{n{n}_{\mathsf{\text{fs}}}}$.
from (10).
For the baseline discretization scheme of EP + LSSC with D = 100, L = Dl = D log_{2}(n_{ LSSC } + 1) = 100log_{2}(n_{ LSSC } + 1). Thus, L = {100, 200, 300, 400} corresponds to l = {1, 2, 3, 4}, n_{LSSC} = {1, 3, 7, 15} and the actual length of the extracted bit string is Dn_{LSSC} = {100, 300, 700, 1500}. While for the featureselection schemes with D_{fs} = 50 where L = D_{ fs }l_{ fs } = D_{ fs } log_{2}(n_{LSSC(fs)}+1) = 50log_{2}(n_{LSSC(fs)}+1), L = {100, 200, 300, 400} corresponds to l_{fs} = {2, 4, 6, 8}, n_{LSSC(fs)} = {3, 15, 63, 255} and the actual length of the extracted bit string becomes D_{fs}n_{LSSC(fs)} = {150, 750, 3150, 12750}. The implication here is that when a particularly large entropy specification is imposed on a feature selection scheme, a much longer LSSCgenerated bit string will always be required.
A great discretization performance achieved by a featureselection scheme basically implies a reliable measure for estimating the discriminativity of the features. In all the subfigures, it is noticed that the discretization schemes that select features based on the LR, RL, and DR measures give the best performance among the feature selection schemes. RL seems to be the most reliable discriminative measure, followed by LR and DR. In contrast, SNR and SD turn out to be some poor discriminative measures that could not guarantee any improvement compared to the baseline scheme.
When LSSC encoding in our 4step approach (see Section 3) is replaced with DBR in Figure 5Ia, Ib; and BRGC in Figure 5IIa, IIb, RL, LR, and DRbased feature selection schemes manage to outperform the respective baseline scheme at low L. However, in most cases, these DBR and BRGCencodingbased discretization schemes with feature selection are found to underperform their baseline eventually when high entropy requirement is imposed. The reason is that the utilized dimensions in such feature selection schemes are reduced by half, causing the partitioning on each feature space to be augmented more rapidly by a factor of ${2}^{n{n}_{\mathsf{\text{fs}}}}$ and thus yielding relatively increasing imprecision of discretetobinary mapping as the entropy requirement increases. For this reason, significant performance degradation with respect to the baseline can finally be noticed at L = 400 in Figure 5Ia, Ib, IIa. Hence, when entropy increases, the EER performance lines of RL, LR and DRbased featureselection schemes usually have steeper increments (degradation) than that of the baseline.
On the other hand, in Figure 5IIIa, IIIb where LSSC encoding is adopted, it is observed that RL, LR and DRbased featureselection schemes outperform their baseline consistently for all values of L, except for DRbased feature selection scheme, when L ≤ 200 in Figure 5IIIa. This particularly justifies that precise discretetobinary mapping of LSSC is essential to enable an effective feature selectionincorporated discretization process when a large entropy requirement is imposed.
4.2.2. Experiment Part II: Performance evaluation of EP + LSSC discretization with RL, LR and DRbased featureselection capabilities
From the EER plots in Figure 6Ia, IIa, it is noticed that DBR and BRGC baselines share a common behaviorthe deterioration of EER performance as L, or l for every dimension, or proportionally S for every dimension increases. Such an observation justifies the imprecise discretetobinary mapping of DBR and BRGCencodingbased discretization. Because the fact that the difference between any pair of interval indices is not equal to the Hamming distance incurred between the corresponding DBR and BRGC codeword labels, the separation of feature components in the Hamming domain will eventually become poorer when more and more segmentations are applied to each singledimensional feature space.
On the other hand, LSSC baseline has its performance stabilized, although with some trivial fluctuations, consistently in Figure 6IIa; and beyond L = 300(l = 3) in Figure 6Ia. Similar performance trend (except with earlier stabilization beyond L = 200 (l_{fs} = 4) can be observed with LSSC encodingbased discretizations with LR, RL, and DRbased feature selection in these two subfigures. This observation basically implies that, irrespective of the entropy requirement imposed on the discretization output, the performance led by discriminative feature selection can reliably be preserved. Therefore, along with the employment of an unsupervised quantization approach, binary strings that fulfil all three desired criteria: discriminative, informative, and privacy protective can potentially be derived.
From both EER and ROC plots in Figure 6, the performance curves of LSSCencodingbased discretizations with LR, RL, and DRbased feature selection are very close to one another. It is believed that such a trivial performance discrepancies among them are probably caused by the slight fluctuation inherent to LSSCbased schemes as the entropy requirement is increased. At L = 300, the outperformance of featureselection schemes to the baseline can averagely be quantified by 2% in Figure 6Ia and 8% in Figure 6IIa. With 0.1% FAR, approximately 5% GAR improvement in Figure 6Ib and 10% GAR improvement in Figure 6IIb are observed.
For LSSCencodingbased discretization, it is worthy of note that the improvements of RL, DR, and LRdiscriminative feature selections in FERET dataset is less significant compared to those in FRGC dataset. This could be explained by the fact that decision made by a featureselecting process on a given set of features may not be ideal due to indefinite pdf estimation from a limited number of training samples. Some indiscriminative feature dimensions may be mistakenly selected. Vice versa, some significantly discriminative dimensions may be excluded by mistake for a similar reason. Therefore, to what extent the influence of a feature selection on a certain baseline performance would greatly depend on the accuracy of the pdf estimation which could range distinctively in accordance with different extracted sets of features. In other words, the quality of the unselected feature dimensions decides the amount of improvement with respect to the baseline. If the excluded feature dimensions are truly the least discriminative dimensions, then the improvement will be the greatest. Otherwise, if the excluded feature dimensions are somehow discriminative, the improvement will be minor; or even worse, performance deterioration could occur. This signifies that the user pdf modelled from as many representative training samples as possible to avoid such trivial improvement or deterioration scenarios. This implies that there is a higher number of lessdiscriminative EREextractable feature components from FRGC dataset than from FERET dataset, where the improvement attained in FRGCbased experiment is generally higher than in FERETbased experiment when the exclusion of those lessdiscriminative components is precisely made.
4.2.3. Experiment part III: A meticulous analysis on EP + LSSC discretization with LR, RL and DRbased featureselection capability
We have seen in part II that the performance of LSSCbased discretization will be driven into a stable state with a trivial level of fluctuations beyond a certain entropy threshold. On the basis of this observation, it is interesting to find out whether it is possible to estimate a proper range of D_{fs} values to achieve the lowest possible EER in practice for all kinds of experiment settings; and what are the other aspects that a practitioner should take note when selecting D_{fs} in the real world implementation. We shall address these issues in the sequel based on LR, RL, and DR discriminative measures that have proven their usefulness in the previous subsections.
In the last part of our experiment, we have varied the number of users (60 and 200 users for FERET dataset; and 75 and 150 users for FRGC dataset) and the number of extracted dimensions (D = 50 for FDA; and D = 100 for ERE) to observe the performance of the discretization schemes in relation to D_{fs}. The objective for the former parameter variation is to find the minimum D_{fs} that could possibly represent a large/small number of users globally; however, for the latter variation, our aim is to examine the improvement of the featureselection schemes with respect to the baseline in accordance with large/small value of D.
A glance of the best D_{ fs } that produces the lowest EER in accordance with settings of experiment part III.
Feature Extraction/dataset  Discriminative measure (no. users)  D_{ fs }(Best EER (%)) 

FDA (D = 50)/FERET  LR (200)  1015(12.60) 
RL (200)  1530(14.00)  
DR (200)  1220(14.60)  
LR (60)  1020(4.60)  
RL (60)  1220(4.90)  
DR (60)  1220(4.80)  
ERE (D = 100)/FERET  LR (200)  2040(2.00) 
RL (200)  2025(1.76)  
DR (200)  2050(2.45)  
LR (60)  10(1.67)  
RL (60)  20(1.68)  
DR (60)  20(1.82)  
FDA (D = 50)/FRGC  LR (150)  12(23.05) 
RL (150)  12(21.97)  
DR (150)  15(22.40)  
LR (75)  12(21.64)  
RL (75)  12(21.12)  
DR (75)  12(20.93)  
ERE (D = 100)/FRGC  LR (150)  2025(11.47) 
RL (150)  1525(11.35)  
DR (150)  1530(12.35)  
LR (75)  20(9.56)  
RL (75)  25(9.00)  
DR (75)  20(10.63) 
In Figure 7, an interesting observation applied to all performance curves is that the EER of each discretization scheme initially decreases until some minimum point(s) before rebounds again, as the number of selected dimensions increases. To explain why this could happen, one needs to first understand that an efficient representation of a given number of users often requires at least a minimal amount of feature dimensions to be utilized to avoid any bit pattern being similarly repeated among other users. Taking performance curves in Figure 7Ia, IIa as an instance, using D_{fs} = 5 to represent 60 users and 200 users are apparently not as effective as using D_{fs} = 12, even though D_{fs} = 12 could have utilized seven additional lessdiscriminative dimensions which may, in an intuitive sense, give a lower classification performance. Beyond the optimal D_{fs} value that produces the minimumEER performance, this is where our prior elucidation holds: the more the lessdiscriminative dimensions are being utilized, the worse the discretization performance would be.
In Table 3, it is noticed that determining the minimum D_{fs} which best represent any specific number of users for all kinds of experiment settings is infeasible. This can be seen from the contradiction that FDAextracted features with D = 50 requires merely 10, 15, and 12 feature dimensions minimally to best represent 200 users from the FERET database for LR, RL, and DRdiscriminative measures respectively; while ERE extracted features with D = 100 requires at least 20, 25 and 20 features to efficiently represent only 75 users from FRGC database for the three selection measures, respectively. We believe that this could be influenced by different distribution of discriminative measurements for all users according to different featureextraction methods.
Nonetheless, given a particular quantity of users under an experiment setting, determining the proper value of D_{fs} should not only rely on the performance aspect. In fact, the amount of bit redundancy should also be taken into consideration. Recall in the previous subsection that the lower the D_{fs} is set, the higher the bit redundancy per user a system would have to afford in order to fulfill a specified system entropy. Therefore, a practical strategy would be to identify the system capability in processing bit redundancy of all users before setting the exact value of D_{fs} subject to the condition that the value of D_{fs} should not be chosen too small to avoid inefficient userrepresentation problem.
4.3. Summary
In a nutshell, our findings can be summarized in the following aspects:

BRGC and DBRencoding schemes are not appropriate for being employed to generate highly discriminative, informative, and privacy protective bit strings due to its inability to uphold the perfect discretetobinary mapping behavior for performance preservation when high entropy requirement is imposed.

Since LSSCencoding scheme is able to maintain the discriminativity of the (selected) feature components and drive it into a stable state (with insignificant fluctuations) irrespective of how high the entropy requirement could be, this encoding scheme appears to be extremely useful when it comes to discriminative and informative bitstring generations.

Our approach integrates highquality feature extraction, discriminative feature selection, unsupervised quantization and LSSC encoding to address the performance, security, and privacy criteria of a binary representation. Among the five discriminative measures in our evaluation, LR, RL, and DR measures exhibit promising discretization performance when they are adopted in our approach.
 In general, the improvement amount of our featureselectionbased approach with reference to the baseline can be influenced by the following three factors:

› The quality of the discriminative measures  LR, RL, and DR are among the reliable ones.

› The accuracy of pdf estimations that could greatly affect the decision of feature selection  it all depends on how reliable and representative the training samples are.

› The discriminativity of the unselected feature dimensions  the noisier such feature dimensions are, the higher the improvement would be.


A tradeoff exists between the redundancy of the bit string and the tunable value of the free parameter D_{fs}. The lower D_{fs} is set, the higher the bit redundancy results. Thus, the bit redundancyprocessing capability should always be considered before by a system practitioner when setting D_{fs}, rather than minimizing it arbitrarily with the aim of attaining the minimumEER performance. Note also that overminimizing D_{fs} may lead to inefficient user representation.
5. Conclusion
In this article, we have proposed a fourstep approach to generate highly discriminative, informative, and privacyprotective binary representations based on a fixedbitallocation principle. The four steps include discriminative feature extraction, discriminative feature selection, equalprobable quantization, and LSSC encoding. Although our binary strings are capable of fulfilling the desired criteria, the binary strings could be significantly longer than any typical static bitallocation approach due to the employment of LSSC encoding and feature selection, thus requiring advanced storage and processing capabilities of the biometric system. We have investigated a couple of existing measures to identify reliable candidates for discretization. Experimental results showed that LR, RL, and DR are among the best discriminative measures and a discretization scheme that employ any of these featureselection measures could guarantee a substantial amount of performance improvement compared to the baseline. The free parameter for feature selection, that is, the number of selected dimensions D_{fs} should be cautiously fixed. This parameter should not be set too small to avoid inefficient user representation problem and enormous bit redundancy overhead. Also, it should not be fixed too large to avoid trivial improvement relative to the baseline.
Notes
Acknowledgements
This study was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korean government (MEST) (No. 201181095).
Supplementary material
References
 1.Chang Y, Zhang W, Chen T: BiometricBased Cryptographic Key Generation. IEEE International Conference on Multimedia and Expo (ICME 2004) 2004, 3: 22032206.CrossRefGoogle Scholar
 2.Dodis Y, Ostrovsky R, Reyzin L, Smith A, in Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data. EuroCrypt 2004, LNCS 2004, 3027: 523540. 10.1007/9783540246763_31CrossRefGoogle Scholar
 3.Hao F, Chan CW: Private key generation from online handwritten signatures. Inf Manag Comput Secur 2002,10(4):159164. 10.1108/09685220210436949MathSciNetGoogle Scholar
 4.Monrose F, Reiter MK, Li Q, Wetzel S: Cryptographic Key Generation from Voice. IEEE Symposium on Security and Privacy (S&P 2001) 2001, 202213.Google Scholar
 5.Monrose F, Reiter MK, Li Q, Wetzel S: Using Voice to Generate Cryptographic Keys. The Speaker Verification Workshop 2001, 237242.Google Scholar
 6.Teoh ABJ, Ngo DCL, Goh A: Personalised cryptographic key generation based on FaceHashing. Comput Secur 2004,23(7):606614. 10.1016/j.cose.2004.06.002CrossRefGoogle Scholar
 7.Yip WK, Goh A, Ngo DCL, Teoh ABJ: Generation of Replaceable Cryptographic Keys from Dynamic Handwritten Signatures. 1st International Conference on Biometrics, LNCS 2006, 3832: 509515.Google Scholar
 8.Juels A, Wattenberg M: A Fuzzy Commitment Scheme. The 6th ACM Conference in Computer and Communication Security (CCS'99) 1999, 2836.CrossRefGoogle Scholar
 9.Kevenaar TAM, Schrijen GJ, Van der Veen M, Akkermans AHM, Zuo F: Face Recognition With Renewable and Privacy Preserving Binary Templates. The 4th IEEE Workshop on Automatic Identification Advanced Technologies (AutoID '05) 2005, 2126.CrossRefGoogle Scholar
 10.Linnartz JP, Tuyls P: New Shielding Functions to Enhance Privacy and Prevent Misuse of Biometric Templates. 4th International Conference on Audio and Video Based Person Authentication (AVBPA 2004), LNCS 2003, 2688: 238250.Google Scholar
 11.Teoh ABJ, Goh A, Ngo DCL: Random multispace quantisation as an analytic mechanism for Biohashing of biometric and random identity inputs. IEEE Trans Pattern Anal Mach Intell 2006,28(12):18921901.CrossRefGoogle Scholar
 12.Tuyls P, Akkermans AHM, Kevenaar TAM, Schrijen GJ, Bazen AM, Veldhuis NJ: Practical biometric authentication with template protection. 5th International Conference on Audio and Videobased Biometric Person Authentication, LNCS 2005, 3546: 436446. 10.1007/11527923_45CrossRefGoogle Scholar
 13.Verbitskiy E, Tuyls P, Denteneer D, Linnartz JP: Reliable biometric authentication with privacy protection. 24th Benelux Symposium on Information Theory 2003, 125132.Google Scholar
 14.Daugman J: How iris recognition works. IEEE Trans Circuit Syst Video Technol 2004,14(1):2130. 10.1109/TCSVT.2003.818350CrossRefGoogle Scholar
 15.Yue F, Zuo W, Zhang D, Wang K: Orientation selection using modified FCM for competitive codebased palmprint recognition. Pattern Recog 2009,4(11):28412849.CrossRefGoogle Scholar
 16.Chen C, Veldhuis R, Kevenaar T, Akkermans A: MultiBits Biometric String Generation Based on the Likelihood Ratio. IEEE International Conference on Biometrics: Theory, Applications, and System (BTAS 2007) 2007, 16.CrossRefGoogle Scholar
 17.Chen C, Veldhuis R, Kevenaar T, Akkermans A: Biometric quantization through detection rate optimized bit allocation. EURASIP J Adv Sig Process 2009. Article ID 784834Google Scholar
 18.Teoh ABJ, Toh KA, Yip WK: 2^{N}discretisation of biophasor in cancellable biometrics. 2nd International Conference on Biometrics (ICB 2007), LNCS 2007, 4642: 435444.Google Scholar
 19.Teoh ABJ, Yip WK, Toh KA: Cancellable biometrics and userdependent multistate discretization in BioHash. Pattern Anal Appl 2009.Google Scholar
 20.Han F, Hu J, He L, Wang Y: Generation of Reliable PINs from Fingerprints. IEEE International Conference on Communications (ICC '07) 2007, 11911196.Google Scholar
 21.Chen C, Veldhuis R: Extracting biometric binary strings with minimal area under the FRR curve for the hamming distance classifier. Sig Process 2011,91(4):906918. 10.1016/j.sigpro.2010.09.008CrossRefGoogle Scholar
 22.Kelkboom EJC, Garcia Molina G, Kevenaar TAM, Veldhuis RNJ, Jonker W: Binary Biometrics: An Analytic Framework to Estimate the Bit Error Probability Under Gaussian Assumption. Biometrics, Theory, Applications and Systems (BTAS '08) 2008, 16.Google Scholar
 23.Kelkboom EJC, Garcia Molina G, Breebaart J, Veldhuis RNJ, Kevenaar TAM, Jonker W: Binary biometrics: An analytic framework to estimate the performance curves under Gaussian assumption. IEEE Trans Systems, Man Cybernet 2010, A 40: 555571.CrossRefGoogle Scholar
 24.Gray F: Pulse Code Communications. U.S. Patent 2632058 1953.Google Scholar
 25.Lim MH, Teoh ABJ: Linearly Separable Subcode: A Novel Output Label With High Separability for Biometric discretization. 5th IEEE Conference on Industrial Electronics and Applications (ICIEA'10) 2010, 290294.Google Scholar
 26.Lim MH, Teoh ABJ: Discriminative and nonuserspecific binary biometric representation via linearlyseparable subCode encodingbased discretization. KSII Trans Inter Inform Sys 2011,5(2):374389.Google Scholar
 27.Belhumeur PN, Kriegman JP, Kriegman DJ: Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 1997,19(7):711720. 10.1109/34.598228CrossRefGoogle Scholar
 28.Jiang XD, Mandal B, Kot A: Eigenfeature regularization and extraction in face recognition. IEEE Trans Pattern Anal Mach Intell 2008,30(3):383394.CrossRefGoogle Scholar
 29.Philips PJ, Moon H, Rauss PJ, Rizvi S: The FERET evaluation methodology for face recognition algorithms. IEEE Trans Pattern Anal Mach Intell 2000.,22(10):Google Scholar
 30.Philips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W: Overview of the Face Recognition Grand Challenge. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR "05) 2005, 1: 947954.Google Scholar
Copyright information
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.