Video-Based Facial Kinship Verification

Yan, Haibin; Lu, Jiwen

doi:10.1007/978-981-10-4484-7_4

Haibin Yan¹⁶ &
Jiwen Lu¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

592 Accesses
1 Citations

Abstract

In this chapter, we investigate the problem of video-based kinship verification via human face analysis. While several attempts have been made on facial kinship verification from still images, to our best knowledge, the problem of video-based kinship verification has not been formally addressed in the literature. In this chapter, we first present a new video face dataset called Kinship Face Videos in the Wild (KFVW) which were captured in wild conditions for the video-based kinship verification study, as well as the standard benchmark. Then, we employ our benchmark to evaluate and compare the performance of several state-of-the-art metric learning-based kinship verification methods. Finally, several observations are provided in evaluation part which may give some hints for the future direction for video-based kinship verification studies.

Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

4.1 Background

The first study on kinship verification from facial images was made in [7]. In their work, they extracted local features such as skin color, gray value, histogram of gradient, and facial structure information in facial images and select some of them for kinship verification. Since this seminal work, more and more kinship verification methods have been proposed in the literature [4, 7, 8, 11, 13, 16, 18, 20, 22,23,24,25]. These methods can be mainly categorized into two classes: feature-based [4, 7, 8, 24, 25] and model-based [9, 13, 20, 22]. Methods in the first class extract discriminative feature descriptors to represent kin-related information. Representatives of such feature information include skin color [7], histogram of gradient [7, 17, 24], Gabor wavelet [5, 17, 20, 25], gradient orientation pyramid [25], local binary pattern [13], scale-invariant feature transform [13, 17, 22], salient part [8, 19], self-similarity [11], and dynamic features combined with spatiotemporal appearance descriptor [4]. Methods in the second class learn discriminative models to verify kin relationship from face pairs. Typical such models are subspace learning [20], metric learning [13, 22], transfer learning [20], multiple kernel learning [25], and graph-based fusion [9].

Most existing kinship verification methods determine human kinship relationship from still face images. Due to the large variations of human faces, a single still image may not be discriminative enough to verify human kin relationship. Compared with a single image, a face video provides more information to describe the appearance of human face. It can capture the face of the person of interest from different poses, expressions, and illuminations. Moreover, face videos can be easly captured in real applications, because there are extensive surveillance cameras installed in public areas. Hence, it is desirable to employ face videos to determine the kin relations of persons. However, it is also challenging to exploit discriminative information of face videos because intra-class variations are usually larger within a face video than a single sill image.

In this chapter, we investigate the problem of video-based kinship verification via human face analysis. Specifically, we make the two contributions to video-based kinship verification. On one hand, we present a new video face dataset called Kinship Face Videos in the Wild (KFVW) which were captured in wild conditions for the video-based kinship verification study, as well as the standard benchmark. On the other hand, we employ our benchmark to evaluate and compare the performance of several state-of-the-art metric learning-based kinship verification methods. Experimental results are presented to demonstrate the efficacy of our proposed dataset and the effectiveness of existing metric learning methods for video-based kinship verification. Finally, we also test human ability on kinship verification from facial videos and experimental results show that metric learning based computational methods are not as good as that of human observers.

Table 4.1 Comparison of existing facial datasets for kinship verification.

Full size table

4.2 Data Sets

In the past few years, several facial datasets have been released to advance the kinship verification problem, e.g., CornellKin [7], UB KinFace [20], IIITD Kinship [11], Family101 [6], KinFaceW-I [13], and KinFaceW-II [13]. Table 4.1 provides a summary of existing facial datasets for kinship verification. However, these datasets only consist of still face images, in which each subject usually has a single face image. Due to the large variations of human faces, a single still image may not be discriminative enough to verify human kin relationship. To address these shortcomings, we collected a new video face dataset called Kinship Face Videos in the Wild (KFVW) for the video-based kinship verification study. Compared with a still image, a face video provides more information to describe the appearance of human face, because it can easily capture the face of the person of interest from different poses, expressions, and illuminations.

The KFVW dataset was collected from TV shows on the Web. We totally collected 418 pairs of face videos, and each video contains about 100 – 500 frames with large variations such as pose, lighting, background, occlusion, expression, makeup, age, etc. The average size of a video frame is about \(900 \times 500\) pixels. There are four kinship relation types in the KFVW dataset: Father–Son (F–S), Father–Daughter (F–D), Mother–Son (M–S), and Mother–Daughter (M-D), and there are 107, 101, 100, and 110 pairs of kinship face videos for kin relationships F–S, F–D, M–S, and M–D respectively. Figure 4.1 shows several examples of our KFVW dataset for each kinship relations. We can see that the KFVW dataset depicts faces of the person of interest from different poses, expressions, background, and illuminations such that it can provide more information to describe the appearance of human face.

4.3 Evaluation

In this section, we evaluated several state-of-the-art metric learning methods for video-based kinship verification on the KFVW dataset, and provided some baseline results on this dataset.

4.3.1 Experimental Settings

For a video, we first detected face region of interest in each frame and then resized and cropped each face region into the size of \(64 \times 64\) pixels. Figure 4.2 shows the detected faces of several videos. In our experiments, if the number of frames of a video is more than 100, we just randomly detected 100 frames of this video. All cropped face images were converted to grayscale, and we extracted the local binary patterns (LBP) [1] on these images. For each cropped face image of a video, we divided each image into \(8 \times 8\) nonoverlapping blocks, in which the size of each block is \(8 \times 8\) pixels, and then we extracted a 59-bin uniform pattern LBP histogram for each block and concatenated histograms of all blocks to form a 3776-dimensional feature vector. To obtain the feature representation for each cropped face video, we averaged the feature vectors of all frames within this video to form a mean feature vector in this benchmark. Then, principal component analysis (PCA) was employed to reduce dimensionality of each vector to 100 dimension.

Table 4.2 The EER (%) and AUC (%) of several metric learning methods using LBP feature on the KFVW dataset.

Full size table

In this benchmark, we used all positive pairs for each kinship relation, and also generated the same number of negative pairs. The positive pair (or true pair) means that there is a kinship relation between a pair of face videos. The negative pair (or false pair) denotes that there is no kinship relation between a pair of face videos. Specifically, a negative pair consists of two videos, one was randomly selected from the parents’ set, and another who is not his/her true child was randomly selected children’s set. For each kinship relation, we randomly took 80% of video pairs for model training and the rest 20% pairs for testing. We repeated this procedure 10 times, and recorded the Receiver Operating Characteristic (ROC) curve for performance evaluation, under which two measures: the Equal Error Rate (EER) and the Area Under an ROC Curve (AUC) were adopted to report the performance of various metric learning methods for video-based kinship verification. Note that small EER and large AUC show high performance of a method.

4.3.2 Results and Analysis

This subsection presents the results and analysis of different methods on KFVW dataset for video-based kinship verification.

4.3.2.1 Comparison of Different Metric Learning Methods

We first evaluated several metric learning methods using LBP features for video-based kinship verification, and provided the baseline results on the KFVW dataset. The baseline methods include Euclidean, ITML [3], SILD [10], KISSME [12], and CSML [14]. The Euclidean method means that the similarity/dissimilarity between a pair of face videos is computed by Euclidean distance in the original space. The metric learning method first learns a distance metric from the training data itself, and then employs this learned distance metric to calculate the distance of a pair of videos from the testing data. Table 4.2 shows the EER (%) and AUC (%) of these metric learning methods using LBP feature on the KFVW dataset. From this table, we see that (1) CSML obtains the best performance in terms of the mean EER and mean AUC, and also achieves the best EER and AUC on the F–S and M–S subsets; (2) ITML shows the best performance on the M–D subset; (3) SILD obtains the best EER and AUC on the F–D subset; (4) all metric learning based methods, i.e., ITML, SILD, KISSME, and CSML, outperform Euclidean method in terms of the EER and AUC; (5) most of methods achieve the best performance on F–S subset compared with other three subsets; and (6) the best EER is merely about 38.5%, and thus video-based kinship verification on the KFVW dataset is extremely challenging. Moreover, Figs. 4.3, 4.4, 4.5 and 4.6 plot ROC curves of several metric learning methods using LBP feature on the KFVW dataset for four types of kinship relations.

Table 4.3 The EER (%) and AUC (%) of several metric learning methods using HOG feature on the KFVW dataset.

Full size table

4.3.2.2 Comparison of Different Feature Descriptors

We also evaluated several state-of-the-art metric learning methods using different feature descriptors. To this end, we extracted the histogram of oriented gradients (HOG) [2] from two different scales for each cropped face image. Specifically, we first divided each image into \(16 \times 16\) non-overlapping blocks, where the size of each block is \(4 \times 4\) pixels. Then, we divided each image into \(8 \times 8\) non-overlapping blocks, where the size of each block is \(8 \times 8\). Subsequently, we extracted a 9-dimensional HOG feature for each block and concatenated HOGs of all blocks to form a 2880-dimensional feature vector. Following the same procedure as in extracting LBP, for a cropped face video, we averaged the feature vectors of all frames within this video to yield a mean feature vector as the final feature representation. Then, PCA was employed to reduce dimensionality of each vector to 100 dimension.

Table 4.3 reports the EER (%) and AUC (%) of several metric learning methods using HOG feature on the KFVW dataset, and Figs. 4.7, 4.8, 4.9 and 4.10 show ROC curves of these methods using HOG feature. From this table, we see that (1) SILD achieves the best performance in terms of the mean EER and mean AUC, and also obtains the best EER on the F–D and M–S subsets; and (2) KISSME obtains the best AUC on the F–D and M–S subsets. By comparing Tables 4.2 and 4.3, we see that metric learning methods using LBP feature outperform the same methods using HOG feature in terms of the mean EER and mean AUC. The reason may be that LBP feature can capture local texture characteristics of face images which is more useful than gradient characteristics extracted by HOG feature to help improve the performance of video-based kinship verification.

4.3.2.3 Parameter Analysis

We investigated how different dimensions of LBP feature affect the performance of these state-of-the-art metric learning methods. Figures 4.11, 4.12, 4.13 and 4.14 show the EER and the AUC (%) of ITML, SILD, KISSME, and CSML methods versus different dimensions of LBP feature on the KFVW dataset for four types of kin relationships, respectively. From these figures, we see that (1) ITML and CSML methods show the relatively stable AUC on four subsets (i.e., F–S, F–D, M–S, and M–D) by increasing the dimension of LBP feature from 10 to 100; and (2) SILD and KISSME methods achieve the best AUC at dimension of 30 and then gradually reduce AUC with the increasing of dimension of LBP feature from 30 to 100. Therefore, we reported the EER and the AUC of these metric learning methods at dimension of 30 on four subsets for fair comparison.

4.3.2.4 Computational Cost

We conducted experiments on a standard Windows machine (Intel i5-3470 CPU @ 3.20 GHz, and 32 GB RAM) with the MATLAB code. Given a face video, detecting face region of interest of a frame takes about 0.9 s, and extracting LBP feature of a cropped face image with size of \(64 \times 64\) takes about 0.02 s. In model training, the training time of ITML, SILD, KISSME, and CSML methods are around 9.6, 0.6, 0.7, and 6.5 s for each kin relationship, respectively. In testing, the matching time of these methods are about 0.02 s (excluding times of face detection and feature extraction) for a pair of face videos.

4.3.2.5 Human Observers for Kinship Verification

As another baseline, we also evaluated human ability to verify kin relationship from face videos on the KFVW dataset. For each kinship relation, we randomly chose 20 positive pairs of face videos and 20 negative pairs of face videos, and displayed these video pairs for ten volunteers to decide whether there is a kin relationship or not. These volunteers consist of five male students and five female students, whose ages range from 18 to 25 years, and they have not experienced any training on verifying kin relationship from face videos. We designed two tests (i.e., Test A and Test B) to examine the human ability to verify kin relationship from face videos. In Test A, the cropped face videos were provided to human volunteers, and volunteers did decision-making on the detected face regions with size of \(64 \times 64\) pixels. In Test B, the original face videos were presented to volunteers, and human volunteers can make their decisions by exploiting multiple cues in the whole images, e.g., skin color, hair, race, background, etc. Table 4.4 lists the mean verification accuracy (%) of human ability on video-based kinship verification for different types of kin relationships on the KFVW dataset. We see that Test B reports better performance than Test A on four kinship relations. The reason is that Test B can exploit more cues such as hair and background to help make correct kinship verification. From this table, we also observe that human observers provide higher verification accuracy than metric learning-based methods on KFVW dataset.

From experimental results shown in Tables 4.2, 4.3 and 4.4 and Figs. 4.3 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 4.10, 4.11, 4.12, 4.13 and 4.14, we make the following observations:

State-of-the-art metric learning methods outperform predefined metric-based method (i.e., Euclidean distance) for video-based kinship verification. The reason is that metric learning method can learn a distance metric from the training data itself to increase the similarity of a positive pair and to decrease the similarity of a negative pair in the learned metric space.
LBP feature presents the better performance than HOG feature for video-based kinship verification. The reason may be that LBP feature can encode local texture characteristics of face images which is more useful than gradient characteristics extracted by HOG feature to help improve the performance of video-based kinship verification.
Metric leaning methods and human observers achieve the poor performance on F–D subset compared with other three subsets, which shows that kinship verification on F–D subset is a more challenging task.

Table 4.4 The mean verification accuracy (%) of human ability on video-based kinship verification on the KFVW dataset four types of kin relationships.

Full size table

References

Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
Article MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Google Scholar
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: International Conference on Machine Learning, pp. 209–216 (2007)
Google Scholar
Dibeklioglu, H., Salah, A.A., Gevers, T.: Like father, like son: facial expression dynamics for kinship verification. In: IEEE International Conference on Computer Vision, pp. 1497–1504 (2013)
Google Scholar
Du, S., Ward, R.K.: Improved face representation by nonuniform multilevel selection of gabor convolution features. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39(6), 1408–1419 (2009)
Article Google Scholar
Fang, R., Gallagher, A.C., Chen, T., Loui, A.: Kinship classification by modeling facial feature heredity. In: IEEE International Conference on Image Processing, pp. 2983–2987 (2013)
Google Scholar
Fang, R., Tang, K.D., Snavely, N., Chen, T.: Towards computational models of kinship verification. In: IEEE International Conference on Image Processing, pp. 1577–1580 (2010)
Google Scholar
Guo, G., Wang, X.: Kinship measurement on salient facial features. IEEE Trans. Instrum. Meas. 61(8), 2322–2325 (2012)
Article Google Scholar
Guo, Y., Dibeklioglu, H., van der Maaten, L.: Graph-based kinship recognition. In: International Conference on Pattern Recognition, pp. 4287–4292 (2014)
Google Scholar
Kan, M., Shan, S., Xu, D., Chen, X.: Side-information based linear discriminant analysis for face recognition. In: British Machine Vision Conference, pp. 1–12 (2011)
Google Scholar
Kohli, N., Singh, R., Vatsa, M.: Self-similarity representation of weber faces for kinship classification. In: IEEE International Conference on Biometrics: Theory, Applications, and Systems, pp. 245–250 (2012)
Google Scholar
Köstinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2288–2295 (2012)
Google Scholar
Lu, J., Zhou, X., Tan, Y.P., Shang, Y., Zhou, J.: Neighborhood repulsed metric learning for kinship verification. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 331–345 (2014)
Google Scholar
Nguyen, H.V., Bai, L.: Cosine similarity metric learning for face verification. In: Asian Conference on Computer Vision, pp. 709–720 (2010)
Google Scholar
Qin, X., Tan, X., Chen, S.: Tri-subject kinship verification: understanding the core of a family. IEEE Trans. Multimed. 17(10), 1855–1867 (2015)
Article Google Scholar
Shao, M., Xia, S., Fu, Y.: Genealogical face recognition based on UB kinface database. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 60–65 (2011)
Google Scholar
Somanath, G., Kambhamettu, C.: Can faces verify blood-relations? In: IEEE International Conference on Biometrics: Theory, Applications and Systems, pp. 105–112 (2012)
Google Scholar
Xia, S., Shao, M., Fu, Y.: Kinship verification through transfer learning. In: International Joint Conference on Artificial Intelligence, pp. 2539–2544 (2011)
Google Scholar
Xia, S., Shao, M., Fu, Y.: Toward kinship verification using visual attributes. In: International Conference on Pattern Recognition, pp. 549–552 (2012)
Google Scholar
Xia, S., Shao, M., Luo, J., Fu, Y.: Understanding kin relationships in a photo. IEEE Trans. Multimed. 14(4), 1046–1056 (2012)
Article Google Scholar
Yan, H., Hu, J.: Video-based kinship verification using distance metric learning. Pattern Recognition (2017)
Google Scholar
Yan, H., Lu, J., Deng, W., Zhou, X.: Discriminative multimetric learning for kinship verification. IEEE Trans. Inf. Forensics Secur. 9(7), 1169–1178 (2014)
Article Google Scholar
Yan, H., Lu, J., Zhou, X.: Prototype-based discriminative feature learning for kinship verification. IEEE Trans. Cybern. 45(11), 2535–2545 (2015)
Article Google Scholar
Zhou, X., Hu, J., Lu, J., Shang, Y., Guan, Y.: Kinship verification from facial images under uncontrolled conditions. In: ACM International Conference on Multimedia, pp. 953–956 (2011)
Google Scholar
Zhou, X., Lu, J., Hu, J., Shang, Y.: Gabor-based gradient orientation pyramid for kinship verification under uncontrolled environments. In: ACM International Conference on Multimedia, pp. 725–728 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Haibin Yan
Tsinghua University, Beijing, China
Jiwen Lu

Authors

Haibin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jiwen Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haibin Yan .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yan, H., Lu, J. (2017). Video-Based Facial Kinship Verification. In: Facial Kinship Verification. SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-10-4484-7_4

Download citation

DOI: https://doi.org/10.1007/978-981-10-4484-7_4
Published: 01 June 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4483-0
Online ISBN: 978-981-10-4484-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Video-Based Facial Kinship Verification

Abstract

Keywords

4.1 Background

4.2 Data Sets

4.3 Evaluation

4.3.1 Experimental Settings

4.3.2 Results and Analysis

4.3.2.1 Comparison of Different Metric Learning Methods

4.3.2.2 Comparison of Different Feature Descriptors

4.3.2.3 Parameter Analysis

4.3.2.4 Computational Cost

4.3.2.5 Human Observers for Kinship Verification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation