Keywords

1 Introduction

From the perspective of security concerns in the current scenarios of the real world, face analysis is very much critical in different aspects. Face detection [1] and recognition [2] are the two well-known face analysis-based works. Other than these two analyses, facial expression analysis [3], physiognomy analysis [4], face blindness, and annotation of faces [5] are also important in different fields of real-time applications. Some of the real-time applications of face recognition are like payments, security, law enforcement, advertising, and health care. All the application-based systems can be meaningful or robust when it follows different variations of faces as inputs like expression, occlusion, pose, and combined variations.

In computer vision, face recognition is an active task. The face recognition system can be useful when it follows different pose variation faces as inputs for testing the system. In the case of the 2D face recognition system, the problem of low recognition rate occurs when illumination and pose variation of face images are used as inputs. These problems have easily handled by 3D faces, where the 3D data consists of extra depth information, and the problem of extreme pose variation of faces, i.e., across 90°, can be solved by registering/aligning the pose-variant face to reference frontal face model. It is an initial problem for the researchers to develop any face analysis-based system. The accuracy of recognition in terms of registration error is very much crucial for establishing system in any real-world environment. The variation of the accuracy of face recognition depends on registration accuracy, which is shown through experiments in an upcoming section of this chapter. The training-based face recognition system can handle misalignment of faces, but it would be problematic for local feature-based applications, though some global feature-based approaches like local binary pattern (LBP), principal component analysis (PCA), and Gabor wavelet features can easily handle small alignment-based faces. Other face analysis-based systems, the registration is also essential in various clinical applications [6]. There exist two main categories of registrations as rigid and non-rigid/deformable registrations. In the medical research domain, deformable registration is mostly used in various applications like radiation therapy [7, 8], physical brain deformation in neurosurgery, and lung motion estimation with deformable registration [9]. The registration is also applicable to heritage reconstruction and industrial application.

After the acquisition stage, other than noise removal, the registration/alignment is one of the essential preprocessing stages. The registration is a process of transforming set of points into a single coordinate. Generally, X-axis registration process needs two objects of any specific domain. First, the source object is denoted by \(S \in {\mathbb{R}}^{d}\) and target object by \(T \in {\mathbb{R}}^{d} , d = \left\{ {2,3} \right\}\). Now, the generalized expression of registration process can be written as in Eq. 1.

$$\widehat{T} = \begin{array}{*{20}c} {\text{argmax}} \\ {F \in {\mathbb{F}}} \\ \end{array} M\left( {S,T,F} \right)$$
(1)

where F denotes transformation function, \({\mathbb{F}}\) denotes the search space, M denotes similarity measure, \(\widehat{T}\) denotes the solution, and Arg max is the optimization.

The rest of the chapter is divided into some significant sections and subsections. Section 2 illustrates the details of pose orientation of the 3D face. A brief review of pose detection techniques is discussed in Sect. 3. The categories of different registration techniques are briefly described in Sect. 4. Section 5 gives a detailed history of 3D face registration research. Next, Sect. 6 discusses the results. Section 7 concludes the chapter.

2 Details of Pose Orientation of 3D Face

In 3D, three basic orientation of an object is possible, according to X-axis, Y-axis, and Z-axis.

Similarly, a human face can be oriented according to the X-axis denoted as roll, Y-axis denoted as yaw, and Z-axis denoted as pitch. Other than this, some of the mixed orientations like X- and Y-axes together also exist in some of the 3D face databases. There exist various well-known and published 3D databases such as Frav3D, Bosphorus3D, Casia3D, and GavabDB. We can categorize the 3D data in two ways: (1) synthetic data (from 2D image to 3D construction) and (2) real data (scanned data using 3D acquisition device). There exist various methods for creation of synthetic 3D data. On the other way, the depth cameras like structured light scanner, laser scanner, Kinect 3D scanner, which are used for acquiring real 3D data. Table 1 illustrates detailed descriptions of pose-variant images of different databases; those are captured using different 3D scanners. Next, Fig. 1 shows different pose orientations of Frav3D face database. After getting the 3D data, generally, it can be represented in three different ways: range or depth or 2.5D image, 3D point clouds in 3D coordinates, and 3D mesh representation. Figure 2 shows the different representations of 3D face data.

Table 1 Description of pose-variant images of different databases
Fig. 1
figure 1

Variation of face pose

Fig. 2
figure 2

Different representations of 3D face data

3 Review on 3D Pose Detection Techniques

Before registration of 3D faces, pose detection is one of the significant tasks for real-time applications of face analysis. From the last few decades, various techniques of head pose detection have been established by the researchers. Here, we have reviewed some new pose detection techniques.

In [10], the authors have detected the head pose by using an adaptive 3D matched filter. Further, the combination of particle swarm optimization (PSO) and iterative closest point (ICP) algorithms has been used for registering a morphable face model to the measured depth data. Next, in [11], the random regression forest technique is used to estimate the 3D head pose in real-time from depth data. It learns a mapping between depth features and some parameters like 3D head position and rotation angle. They extended the regression forest to discriminate the depth patches for predicting pose and classification of the head. In [12], the authors have used convolutional neural network (CNN)-based framework named as POSEidon+ that exploits the recent deterministic conditional generative adversarial network (GAN) model that generates grayscale image from depth data for head pose estimation. In [13], the authors have used a geometrical model for finding out the pose orientation. They have detected pose from both single and combined axis rotated faces. Initially, nose tip and eye corner landmarks are detected on different pose-variant 3D faces. In the case of single-axis rotation, first, they segregate Z-axis rotated face images from other pose-variant faces by using a horizontal line with respect to eye corners. In case of Z-axis rotated faces, a horizontal line does not exist between two eye corners. Further, for segregating X- and Y-axis, use nose-tip point. In X-axis-based rotated faces, the deviation of nose tip with respect to Y-axis is more than X-axis. Similarly, vice versa for Y-axis-based rotated faces. Now, for combined/composite-oriented faces, they have checked previous methods of any two pose detection in X-, Y-, and Z-axes rotation together. In [14], the authors use bilateral symmetry of face and estimate the pose based on central profile. The central profile is intersection curve between a symmetry plane and 3D surface. The curve starts from forehead and ends up to chin portion. The normal from each point of the curve is parallel to symmetry plane. The work generates a symmetry plane from the central profile curve. The pose detected based on deviation angle of symmetry plane. Currently, most of the researchers have used CNN-based techniques for head pose detection from 2D face data.

4 Registration Technique

Registration is an essential preprocessing operation of any real-time face analysis system that aligns the different 3D views into a standard coordinate system. In the case of a face recognition system, registration is a crucial step for overall performance of recognition. There are various techniques of registration in 2D and 3D domains. In 3D registration, always full model reconstruction has been done. So, considering efficient technique for taking less amount of time, when massive number of point clouds will be registered. After the registration, there are no such optimized techniques for matching. The registration techniques can be divided into three categories as a whole face, point, and part registrations.

4.1 Whole Face Registration

In this technique, the registration is done using the full-face region. Further, it can be divided into two distinct parts as rigid and non-rigid.

Rigid Registration

Rigid registration is the global transformation [15] through some landmark points. It is also called Euclidean or affine transformation. It includes rotations, translations, reflections, and their combinations. This registration technique is very much simple compared to others. Generally, most of the system used eye, nose, and mouth position-based landmarks [16, 17] for transformation. In some of the cases, many points (nearly 50–60) have used for transformation; it is also termed as active appearance model (AAM)-based [18] transformation. These model-based techniques are sensitive to registration error compared to the landmarks-based approaches. Some traditional rigid registration techniques [19,20,21] register the surfaces using a 3D point cloud. An iterative algorithm is a beneficial algorithm for rigid displacement between two separate point clouds of the surfaces.

Iterative Closest Point

This is one of the most popular rigid registration techniques in any working domain for 3D data. The iterative algorithm is used for superposing two surfaces, S1 on S2, which is the rigid displacement (R, t) of one surface, where R denotes orthogonal transformation, and t is the translating vector of the origin. Each iteration i of the iterative algorithm consists of two steps. The details of the algorithm are given below.

Algorithm ICP

  • Step 1: For each points P of S1, a pair (P, Q) is added to Matchi, Q is the closest points of S2 to the point R(i-1)*P + t(i-1). The distance map method [22] is generally used for computing the closest point.

  • Step 2: In this step, the error metric calculation is done for accurate displacement.

    Generally, the least squares method [23] is used for error calculation.

The algorithm mainly depends on six different steps.

  • Selection of some set of points from both point clouds

  • Matching of the points between two-point clouds

  • Weighting the corresponding pair

  • Rejecting specific pairs based on checking all the pairs

  • Assign an error metric

  • Minimizing the error.

The algorithm is a continuous process of finding minimum distance. So, it is terminated based on any three approaches: (1) Given a threshold on distance measurement, (2) Difference of distance values between two successive iterations, and (3) Reached the maximum iteration.

Non-rigid Registration

The transformation happens with respect to the localized stretching of the image. It allows non-uniform mapping between images. AAM is also used for non-rigid registration by using piecewise affine transformation around each landmark [24]. It is also used for small changes in faces like expression. Facial sequence registration by a well-known technique SIFT-flow [25] is also possible. Nowadays, most exciting and challenging work in registration involves the non-rigid-based registration due to soft tissue deformation during imaging.

Affine Transformation

Affine transformation is similar to rigid transformation, but it includes extra transformation operations like scaling, shearing. So, after registration, the shape of the object will be changed. Due to the change of shape or scale, it is treated as non-rigid.

4.2 Point-Based Registration

Point-based registration is also termed as point matching. The two individual points’ set are aligned through spatial transformation. Finding out the fiducial points correctly is one important part of this type of registration. The active appearance model (AAM) [18] is widely used for point-based registration. This technique is also possible using local feature-based that extracts from the image directly like corner detection of an image. Point set technique can work on raw 3D point cloud data. Let {P, Q} be two finite point sets in a finite-dimensional vector space \({\mathbb{R}}^{d}\). After spatial transformation, T, the registered point sets T(P). Now the main focus is to minimize the Euclidean distance between two points’ sets as in Eq. 2.

$${\text{dist(}}T(P),Q )= \mathop \sum \limits_{p \in T\left( P \right)} \mathop \sum \limits_{q \in Q} (p - q)^{2}$$
(2)

Rigid and non-rigid registrations are also part of the point-based registration.

4.3 Part-Based Registration

The registration process with respect to some major parts of the object can be classified as parts’ registration. According to the human face, the registration in terms of eyes, mouth, etc requires spatial consistency of each part. The parameters—number, size, and location of parts—are varied according to the requirement. Considering AAM—the parts are localized through patches around the detected landmarks. Part-based registration is treated non-rigid/affine registration when shape of object changes.

4.4 Registration Using Model-Based Approach

The model-based registrations are like active appearance model (AAM) [18] and active shape model (ASM) [26]-based alignment. These models are mainly used for human faces.

AAM is one useful method for locating objects. At the time AAM creation, many images with different shapes are used for training the model after that selects some crucial points for annotation—a series of transformations such as PCA, mean shape that used for the face alignment. Initially, estimate the model in any fixed position. Further, some suggestive movements give proper alignment.

ASM is a statistical model of the shape of the object that iteratively fits the example object in a new image. ASM is handled by point distribution model (PDM). It is a local texture model for face alignment.

4.5 Pros and Cons of Registration Techniques

The registration techniques are mainly divided into two parts: rigid and non-rigid. From the transformation point of view, the rotation and translation operations are used for the rigid registration process, whereas the non-rigid registration performs nonlinear transformation such as scaling and shearing. These non-rigid transformation operations are also termed as an affine transformation. The discussion on the advantages and disadvantages of rigid and non-rigid registrations is given below.

The rigid global transformations are easy to implement. For 3D image registration, six parameters are used (three for translation and three for rotation). It is the well-known approach as a research tool, whereas non-rigid registration allows non-uniform mapping between images. The non-rigid registration process corrects small and varying discrepancies by deforming one image to match with the other. However, the non-rigid technique is partly used due to difficulties in a validation. The validation states that a registration algorithm applied to any data of an application consistently succeeds with a maximum (or average) error acceptable for the application. Landmark-based rigid registration validates the system quickly. It is proved that landmark distance between the source and target model is always fixed for rigid registration, so the error analysis has been studied and calculated target registration error for the whole volume, and that can be estimated from the position of landmark accurately. In the case of non-rigid registration, the landmark distances between source and target models always differ. So, the maximum or average registration error can not be calculated from the landmark positions.

5 History of Registration Technique for 3D Face Analysis

3D face registration is one of the essential preprocessing steps for developing any robust 3D face analysis system. Pose alignment cannot be handled easily in the case of the 2D face due to insufficient image depth data, whereas 3D data easily overcome the problems. In this section, we have discussed some of the recently proposed techniques of 3D face registration.

In [27], the authors have developed a 3D face mesh of different orientations from the 2D stereo images. After getting the 3D models of different orientations of any subject, they have considered two symmetric oriented view models for registration. They have identified some significant landmark points on those symmetric view 3D models. Further, they propose a spatial–temporal logical correlation (STLC) algorithm for accurate disparity mapping with a coarse and refined strategy. Finally, 3D data of the whole face region obtained by fusing the left and right measurements. In [28], the registration of pose-variant inputs of this recognition system uses three-dimensional variance of the facial data for transforming it into frontal pose. Unlike the traditional registration algorithm with respect to two images (source and target), the proposed algorithm works with a single image for registration. The test image rotates and calculates variance of XY, YZ, and ZX planes separately. The resulting variance gives the output of registered image. In [29], the authors introduce a novel 3D alignment method that works on frontal face as well as right and left profile face images. At first, detection of pose is done by nose tip-based pose learning approach. After detection, coarse to fine registration is happened by using L2 norm minimization that is followed by a transformation process to align the whole face. The alignment process is done in three separate planes: XY, YZ, and ZX. Accurate nose-tip detection in different probe faces and frontal faces is the main challenge of this proposed work. Next, in [30], 3D face sequences are used for the implementation of the proposed registration technique. The authors have used regression-based static registration method that is improved by spatiotemporal modeling to exploit redundancies over space and time. The method is fully geometric-based approach, not require any predetermined landmarks as input. The main innovation is to handle dynamic 3D data for registration, whereas most of the proposed techniques are worked with static data. Another way of 3D face registration is shown in [31]. The 3D morphable model (3DMM) constructed for developing a face recognition system. The authors have considered different types of populations like Black and Chinese. Further, the morphable model was constructed by mixture of Gaussian subpopulations that are named Gaussian mixture 3D morphable model (GM3DMM). The global population treated as mixture of subpopulations. As per previous works, these models are used for 2D face recognition, where the 2D images are mapped onto model. These models have also solved the pose variation issue of face image. Next, in [32], the authors have constructed three-dimensional facial structure from single image using an unsupervised learning approach. Further, remove pose variation problem by doing viewpoint transformation on the 3D structure. The affine transformation warps the source to target face using neural network. The authors have given the name of the proposed network as Depth Estimation-Pose Transformation Hybrid Networks (DepthNet in short). In the case of rotation, the predicted depth value and the affine parameters are generated through pseudo-inverse transformation of some key points of source and target. In [33], 3D face registration was done by using a novel conformal mapping algorithm. The mapping algorithm happens through harmonic energy. Minimizing the harmonic energy by a specific boundary condition of the surface and obtain conformal mapping. Facial surface matching is done by conformal mapping. The method is robust to complicated topologies and not sensitive to quality of surface. In [34], the improved ICP algorithm was used for accurate registration. The registration process of this system divided into two stages. At the first stage, rough registration has been done on depth maps. Initially, selecting some SIFT feature points on the depth map image. The transformation was done for the correct matching of feature points. Here, SVD technique was used for transformation. After that accurate registration is done through ICP algorithm on 3D point clouds. In [35], the authors have introduced a method that solves the problem of face alignment and 3D face reconstruction simultaneously from 2D face images with pose and expression variations. Two sets regressors are used for landmarks detection and 3D shape detection. There is a mapping from 3D-to-2D matrix. The landmark regressor is used to detect landmarks on 2D face images. The shape regressor is used to generate 3D shapes. The regressors adjusted 3D face shape and landmark points. From this adjustment, finally it produced the pose–expression–normalized 3D (PEN3D) face shapes. In [36], the authors wanted to focus the ICP technique in non-rigid form. They find out different non-rigid portions of the face and combine those non-rigid ICP algorithms. The main contribution is describing and fitting 3D faces in new form by learning a local statistical model for facial parts, and ICP algorithm is applied to individual parts before combining to a single model. In [37], the authors have developed a 3D morphable model. This type of model has some drawbacks. It has taken huge numbers of good quality face scans for training the model. For removing the drawback, the scans need to be brought into dense correspondence with a mesh registration algorithm. Similar to previous work, in [38], the 3D morphable face model (3DMM) from 2D image was used for removing pose and illumination variations related issues. The extended idea of this proposed model remaps the pose-variant faces into frontal face. The process is done by fusion of 2D pose-variant textured image and 3D morphable model. In [39], the registration process is done in two stages. In the coarse registration stage, the principle components of unregistered and frontal face images were used for finding the smallest correspondence between each point on the three-dimensional surface. The centroid of the two-point clouds is used to calculate the smallest distance. Next, fine registration is computed using ICP technique. In [40], the authors have worked with 3D facial motion sequence that analyzes statistically the change of 3D face shape in motion. They have introduced landmark detection on the 3D motion sequence-based Markov random fields. To compute the registration, multi-linear model is used. They have used a learned multi-linear model for fully automatic registration of motion sequences of 3D faces. The model previously learned through using a registered 3D face database. In [41], a mathematical model for 3D face registration is used with pose varying across 0°–90° in both +ve and −ve orientations. Initially, choose a basic landmark model against whom the images are to be registered. Hausdorff distance metric is used to calculate the distance for alignment. In this technique, all types of pose variations, like pitch, yaw, and roll, were used. In [42], first, they have introduced a feature localization algorithm based on binary neural network technique—k-nearest neighbor advanced uncertain reasoning architecture (kNN AURA) to train and find out the nose tip. The face area has been detected based on the nose tip. After that, registration of 3D faces is done through some integrated steps. First, PCA is used for removing misalignment roughly and then minimizes the misalignment about oy- and oz-axes using symmetry of the faces. Finally, ICP algorithm was used for correct alignment on the expression invariant areas. In [43], the nose tip has been used for 3D face registration using maximum intensity algorithm. The rotational and translational parameters are used for registration. The deformation angle of pose-variant faces calculated based on the maximum intensity point that denotes nose-tip point. The method overcomes the problem of ICP, Procrustes analysis-based registration. Before applying ICP, there should be rough alignment first. In Procrustes analysis, various landmarks must be detected first. The proposed method does not work with high pose. In [44], a non-rigid registration algorithm is used that deforms a 3D template mesh model to the best match depth image. The authors have not used any landmark detection technique. They introduced image-based non-rigid deformation with an automatic feature detection process. In [45], the authors introduced topology adaptive non-rigid methods for registering 3D facial scans. It combines non-rigid mechanisms and mesh subdivisions to improve the drawbacks of previous traditional methods. In [46], the authors initially localized the nose tip on the range image in an efficient way. Further, calculate angular deviation in between nose tip of frontal face and pose-variant faces with respect to the difference between x- and y-positions of the depth image. After

6 Result Analysis and Discussion

In this section, first, we have discussed the actual need of registration process in any face analysis system. We have experimentally analyzed the issue by developing a simple 3D face recognition system on three well-known 3D face databases: Frav3D, GavabDB, and Casia3D. We have used a simple PCA technique for feature reduction followed by the support vector machine (SVM) classifier for classification. All three input databases consist of different varieties of face data including pose variation. First, we have registered all the pose-variant faces using the well-known ICP algorithm, where 3D point cloud is considered as input. Further, we have constructed depth images from 3D point clouds for the input of recognition system. Two sets of components, 100 and 150 components from PCA technique, are used for classification. The twofold cross-validation applied for training and testing the recognition system on individual datasets including neutral and other variations of faces. Table 2, given below, illustrates comparative analysis of face recognition accuracies between inputs with ICP and inputs without ICP.

Table 2 Importance of registration based on 3D face recognition results of three individual 3D face databases

From the result set of the above table, it is clear that registration is a primary preprocessing technique in any face analysis system. Now, we have discussed the experimental results of previous different registration techniques in Table 3; those techniques are already discussed in Sect. 5.

Table 3 Discussion on experimental result on different proposed registration techniques

7 Conclusion

Face image registration is one of the necessary processes in any facial analysis system for maintaining the robustness of system. In this chapter, we have emphasized the concept of registration, along with some current works on pose detection. It has shown that the importance of face registration with respect to face recognition is enormous.

Over the past decades, there exist various registration techniques that are used in different applications. Here, we have mainly focused on the 3D face registration technique that followed by landmark detection, face recognition, etc. There are two groups of registration techniques: rigid and non-rigid. The iterative closest point (ICP) is one of the popular rigid registration techniques. Other than ICP, various rigid- and non-rigid-based techniques are existed. The advantages and disadvantages of different techniques followed by experimental analysis with respect to 3D faces are discussed in this chapter. It can be concluded that the chapter will provide the details of all the recent registration techniques, which are worked on either 3D face point cloud or 3D face depth images. In the future, the registration process can be possible using 3D voxel representation as input.