Keywords

1 Introduction

Recently, the motion capture (MoCap) technique is widely used in animation field, which is speeding up the process of facial animation production. Research on facial MoCap and animation techniques has been intensified, due to its wide range of applications in gaming, security, entertainment industry and human computer interaction. With the accumulation of large number of researches in animation, the research of facial animation increase rapidly. Much like the technology of body MoCap, facial motions are generally captured by tracking the movements of a set of markers placed on a subject’s face and the capture device is used to track motion trajectories of these markers. Nowadays, facial animations are even popularly used in various kinds of media. However, it is still a challenging work for animators to generate realistic facial animation. That is because a face is the most expressive and variable part in one’s appearance, and each detail of a face may have different meanings for different people.

2 Related Works

In recent years, on the reality facial animation, many experts and scholars have done some effective work. Further in-depth discussion of approaches to facial animation can be found in Alexanderson et al. [1], Yu et al. [2], Leandro et al. [3] and Park [4]. Waters [5] defined a model for the muscles of the face that can be extended to any non-rigid object and is not dependent on specific topology or network. A combination of parameterized techniques with the muscle model was designed to perform complex articulations. He defined control method for facial animation based on the use of a few muscle functions. The displacement of individual vertices was defined relative to their location within each individual deformation volume. And our approach is inspired on this to generate a real-like animation. Williams [6] described a method for animating faces direct from video using a set of warping kernels. Each warping kernel is used to apply the motion of a particular point in the tracked image to the underlying mesh representation, while Guenter et al. [7] discussed the capture of facial expressions by tracking large sets of markers. Using this technique, highly realistic facial movements can be synthesized by representing animations both in terms of the change in geometry over time and the change in texture. He generated video-realistic 3D facial animations by capturing both geometric shape variations and facial textures simultaneously from multiple-view videos. However, captured texture sequences can only be applied to the same subject. For getting a more reality of facial animation, Platt et al. [8] proposed face models with physical muscles. Even though, muscles and skin tissues are simulated in muscle-based facial animation, these approximate models are still difficult to deal with subtle wrinkles and creases. In this paper, we focused on the use of Mocap data to drive facial models, and tried to deploy some effective and simple approaches to facial animation. From Li’s [9] work, a novel data-driven 3D facial MoCap data editing system by automated construction of an orthogonal blend-shape face model and constrained weight propagation was proposed. Based on a collected facial MoCap dataset, a region-based principal component analysis was applied to build an orthogonal blend-shape face model. They formalized the 3D facial MoCap editing problem to a blend-shape animation editing problem. And Tu [10] proposed a facial animation system for capturing both geometrical information and subtle illumination changes. While tracking the geometric data, they recorded the expression details by ratio images. Lorenzo et al. [11] described methods to animate face meshes from motion capture data with minimal user intervention using a surface-oriented deformation paradigm. The techniques described allow the use of motion capture data with any facial mesh. Noh and Neumann [12] proposed a method for retargeting motions embedded in one mesh to another of different shape and topology. The method assumes that the motion is fully defined across the surface of the source mesh and does not tackle the problem of extracting surface deformations from MoCap data. The method uses scattered data interpolation techniques in coordination with a heuristic approach to retarget the motions. Deng [13] presented a semi-automatic technique for directly cross-mapping facial motion capture data to pre-designed blend-shape facial models. The facial models need not be based on muscle actuation bases and the geometric proportions of the blend-shape face models may vary from that of captured subject. However, the difficulty lies in the normalization and alignment of the tracked facial motion in order to force all motion data into the same coordinate system. And this approach requires manual work in the setup stage, including selecting reference motion capture frames and tuning blend-shape weights.

In motion capture and cloning of human facial expressions, Xu [14] had done some researches. In the stage of building model, he designed an exact muscle motion model, which can be driven by captured motion data. In the stage of expression cloning, he realized motion explanting through calculating the texture mapping variations between the template model and special model. However, in his work, only 18 feature points are extracted, which are confined to eyes and mouth, besides that, the face models are limited to the same topology. By dividing facial model into different regions based on RBF mapping, Shen [15] proposed a facial animation reconstruction method based on region divisions. She used different RBF functions in different regions on face to reconstruct animation, and then influencing factor was used to enhance the motion continuity of the adjacent area for two adjacent regions. In this way, she reconstructed a complete facial animation. Our approach also deploys region divisions, but using an calibration method to adjust deformed motion points to get discontinuous motion. Lin [16] proposed a procedure to estimate 3D facial motion trajectory from front view and mirror-reflected video clips. In his work, users had to manually specify the corresponding features such as the mouth corners, nose tip, eye corners etc. In facial animation, a general facial model is separated into 11 regions. Control points within a region can only affect vertices in that region, and interpolation is applied to smooth the jitter effect at the boundary of two regions. Fang [17] realized expression animation and simulation of personalized face based on facial MoCap data. In his work, a scheme of functional region partitioning for cross-mapping and driving facial motions was proposed, and a pre-computing algorithm was proposed to reduce computational cost. He divided the human facial model into functional regions by interactive technology, and configured facial markers for target model. Based on radial basis function method (Radial Basis Functions, RBFs), he built a cross-mapping algorithm to generate motion data for personalized facial models. In Fang’s work, virtual markers were used to enhance the motion continuity of the area of two adjacent functional regions. On the basis of Fang’s [17] work, a different region partition method is proposed, without using virtual markers to realize a realistic facial animation. Yang et al. [18] realized to animate a new target face with the help of real facial expression samples. They transferred source motion vectors by statistic face model to generate a reasonable expression on the target face, and using local deformation constraints to refine the animation results. Weise et al. [19] achieved real-time performance using a customized PCA tracking model. Using a 3D sensor, user is recorded in a natural environment. And 3D facial dynamics can be reconstructed in real-time without use of face markers, intrusive lighting. They used a face tracking algorithm that combined geometry and texture registration with pre-recorded animation priors in a single optimization. Although their structured-light scanner generated high quality depth maps, online tracking can only be limited to geometry registration. Tena et al. [20] presented a linear face modeling approach that generalized to unseen data while allowing click-and-drag interaction for animation. The model is composed of a collection of PCA sub-models that are independently trained but share boundaries. Because the geodesic distance provides an approximate description for the distance along mesh surfaces, which is important for expression synthesis, so Wan X et al. [21] proposed a new scheme to synthesize natural facial expressions using the geodesic distance instead of the Euclidean distance in the RBF interpolation. Based on FEM method, Sifakis et al. [22] proposed an automatic extraction muscle movement method from captured facial expression motion data. They built an anatomically accurate model of facial musculature, passive tissue and underlying skeletal structure using volumetric data acquired from a living male subject. However, the model building is difficult, and calculation is very large.

3 Algorithm Overview

In this paper, we describe two methods of reconstruction for facial animation. One is realized on basis of facial model divisions, while the other is realized without divisions.

3.1 Facial Animation with Model Divisions

In our work, in order to animate a facial mesh using MoCap data, the displacements of the tracked feature points must be interpolated to determine the motion of individual vertices. Discontinuities in the mesh must also be taken into account, allowing the lips to move independently. For facial animation reconstruction with model division method, its flow is shown in Fig. 1.

Fig. 1.
figure 1

The flow chart of facial animation with model divisions

In Fig. 1, given a facial model as the input of an animation task, Firstly, a general face is separated into four regions: lower lip, upper face, left-eye and right-eye. Secondly, the deformed motion vectors are obtained for all the points in different regions at the first frame. Thirdly, the model points’ coordinates in lower lip are calculated by deploying the local captured motion data from lower lip, meanwhile, the model points’ coordinates in upper face and eyes are obtained by employing all the captured data based on RBF. Fourthly, we adjust the requested coordinates of model points in each region to obtain the exact deformed motion data. Finally, we get the corresponding deformed animation for captured motion data in different frame. Animation graph is as the final output of facial animation reconstruction.

3.2 Facial Animation Without Model Divisions

For the method of facial animation reconstruction without model division, the process is given in Fig. 2. A facial model is given as the input of animation task. Firstly, the one-ring neighbors of each point are got, and the distances between each point and its one-ring neighbors are obtained. Secondly, the points’ coordinates after deforming are gained through RBF mapping. Thirdly, for each point in facial model, the local neighbors’ motion vectors and its corresponding weights are calculated based on muscle motion mechanism. Fourthly, we adjust each deformed point requested in step 2. In the last, the adjusted animation graph is as the final output of facial reconstruction.

Fig. 2.
figure 2

The flow chart of facial animation without model divisions

4 Animation Reconstruction Based on Model Division

4.1 RBF Method

Radial Basis Functions (RBFs) method is adapted to all types of discrete-point interpolation problem. It is widely used in the smooth surface construction, motion capture data interpolation and data recovery [23, 24]. The retargeting of motion capture points is performed by the use of RBF method, which is used to provide a mapping from the space of the motion capture data to that of the facial mesh.

For facial animation, it is insufficient to animate expression details only by RBF mapping because of the discontinuous facial model. For example, when the lower lip moves, we do not expect the upper lip to be dragged along with it. To handle this, facial model is divided into functional regions, and a calibration method is used to adjust the deformation of the motion data to ensure their exactness. With this framework, we can display delicate facial expressions without additional complexities. Therefore, the proposed method can efficiently animate facial animation with realistic expressions.

4.2 Model Divisions

In order to reconstruct the discontinuity for facial expression, especially for the mouth, a general face is separated into four regions: lower lip, upper face, left-eye and right-eye. The separation method is shown in Fig. 3:

Fig. 3.
figure 3

Divisions of facial model

Where, the box with labeled numbers is the OBB bounding box of virtual face. And the labeled numbers represent the direction. A mesh is used to separate the mouth to move freely. A local mesh is used to mark the lower lip. And left and right eyes, respectively, as an independent region, which is shown in yellow circle. The remaining area removing the lower lip and eyes is considered as the upper region. In this method, when the mouth moves, these every part can move independently.

4.3 Facial Animation Reconstruction Algorithm

Feature points are the positions where markers are placed. Locations of tracked markers in each frame are where the facial features are. For others that aren’t feature points in facial model, RBF (Radial Basis Function) data scattering method is utilized to conjecture their displacements. The RBF is well known for its interpolation capability whether in 2D or 3D, which is used to provide a mapping from the space of the source MoCap data to that of the target mesh.

4.3.1 Facial Motion Retargeting

Retargeting of MoCap points is performed by the use of scattered data interpolation methods. RBF is used to create a mapping between a static frame of the captured sequence and the target mesh. In subsequent frames, this same mapping is used to transform the movement of the captured markers into the space of the target mesh. When the personalized facial models are constructed, captured facial motions are applied to control points on the facial model. When the displacements of all control points are determined, the facial models can also be deformed by the RBF data scattering method. Once we repeat the previous process frame by frame, we can generate facial animations according to the captured facial motion data. On base of this, there still exist some unnatural facial expressions. The major reason is that the mouth cannot be reconstructed realistically with the whole facial mesh.

In Fang’s [17] work, he realized real-like facial reconstruction by adding virtual markers, which are hard to place on facial models, and adding extra computation time, so on basis of Fang’s [14] work, we also deploy region divisions, but different from his, we mainly separate mouth into two parts, the lower lip and the upper face, and a calibration method is used to adjust deformed motion data, which realizes the discontinuity of facial expression reconstruction without using virtual markers. Firstly, the deformation of facial model is realized by deploying the RBF mapping in the first frame, and then we used the whole captured motion data to gain the deformed points in upper face and eyes. Thirdly, the deformation of lower lip is got by deploying the captured motion data in lower lip only. Because the requested deformed motion data are not exact for the motion, so finally, we need to adjust the value of them. Concrete steps will be shown in the following.

4.3.2 Adjusting Deformed Motion Data

Given a face model and captured motion sequences, there are five steps used to explain how the animation process works.

Step 1: Given the first frame MoCap data and facial model, lower lip can be deformed by the RBF data scattering method through deploying only the local MoCap data from lower lip. The upper face and eyes can be deformed based on RBF by deploying all the MoCap data. In this way, all the points’ deformed movements in the first frame can be requested, which are marked as V i ;

Step 2: For lower lip region, based on RBF, we request the coordinates of points using the local MoCap data in lower lip, and we call the deformed motion data as P t ;

Step 3: For upper face and eyes, we requested the points’ coordinates using the whole MoCap data based on RBF, and we call the deformed motion data as P t ;

Step 4: For the deformed facial model, we adjusted the deformed points through subtracting motion vectors requested in Step 1;

$$ P_{t}^{'} = P_{t} - V_{i} $$
(1)

Step 5: We repeat the previous process frame by frame, and the adjusted motion data in each frame are as the final deformed motion data.

5 Animation Reconstruction Without Model Divisions

The disadvantage to the above approach is that each model requires a mask to be defined in order to enforce the discontinuous nature of the face. In order to solve this problem, in this section, we described how to apply muscle motion mechanism to reconstruct facial animation without model divisions. Our method is guided by the muscle motion mechanism proposed by Waters [5]. Considering the nonlinear relationship of neighbors to the current marker, for each point, we request the movements of one-ring neighbors based on muscle motion mechanism.

In order to animate a facial model using MoCap data, the displacements of the tracked feature points must be interpolated to determine the motion of individual vertices.

5.1 Muscle Motion Mechanism

Water’s muscle model [5] is a widely used animation model. In his work, Muscle V 1 V 2 is decided by two points, the point of bony attachment and the point of skin attachment. At the point of attachment to the skin we can assume maximum displacement, and at the point of bony attachment zero displacement. A fall-off the displacement is dissipated through the adjoining tissue, both across the sector P m P n and V 1 P s . Using a non-linear interpolant, it is possible to represent the simple action of a muscle such as in Fig. 4. P is moving to P′, and Point P(x,y) is displaced by P′ (x′,y′).

Fig. 4.
figure 4

Muscle vector model [5]

Where V 1 and V 2 are used to construct a linear muscle. Considering that any point P located in mesh grouped by V 1 P r P s field will be displaced by P′ located in the direction of PV 1 vector. And the point P existing in the field of V 1 P n P m , displacing radius factor R is defined as:

$$ R = \cos \left( {\left( {1 - \frac{D}{{R_{S} }}} \right) \times \frac{\pi }{2}} \right) $$
(2)

The point P existing in the field of P n P r P s P m , displacing radius factor R is defined as:

$$ R = \cos \left( {\frac{{D - R_{s} }}{{R_{f} - R_{s} }} \times \frac{\pi }{2}} \right) $$
(3)

5.2 Animation Reconstruction Based on Muscle Motion Mechanism

The deformed marker positions can be adjusted by applying the muscle motion mechanism. In this paper, the one-ring neighbors’ movements are utilized to achieve a reliable algorithm.

Two points are mainly considered to use one-ring neighbors to adjust deformed motion data. On one hand, considering the global scattered RBF interpolation method, which cannot solve the problem of facial model’s discontinuity. On the other hand, for each point, its farthest neighbor has least influence to its movement. The points derived from its one-ring neighbors can most influence the current point’s movement. So we proposed muscle motion mechanism based on one-ring neighbors to adjust deformed motion data.

The concrete process of facial animation reconstruction without model divisions works as following:

Step 1: Given a facial model, for each point in facial mesh, we request its one-ring neighbors and the distances between the current point and its one-ring neighbors.

Step 2: By deploying all the MoCap data, we request the deformed motion data based on RBF, and we mark the deformed motion data as P t ;

Step 3: For each point, we request its one-ring neighbors’ deformed movements and their corresponding weights by applying the muscle motion mechanism based on one-ring neighbors.

$$ \omega_{i} = \cos \left( {\frac{{d_{i} - Rs}}{Rf - Rs}*\frac{\pi }{2}} \right) $$
(4)

Where, \( \omega_{i} \) is the weight of motion vector grouped by the deformed point and the source point. In the one-ring neighbors of each point, R s is the nearest distance to the current point. R f is the farthest distance to the current point.

Step 4: We adjust the deformed motion data by subtracting local deformed motion vectors requested in Step 3;

$$ P_{i}^{'} = P_{t} - \frac{{\sum\limits_{i = 1}^{n} {\left( {\omega_{i} V_{i} } \right)} }}{{\sum\limits_{i = 1}^{n} {\omega_{i} } }} $$
(5)

Where, \( P_{t}^{'} \) is the adjusted point, n is the number of one-ring neighbors for the current point.

Step 5: We repeat the above steps frame by frame and then get the final adjusted motion data in each frame.

6 Experiments and Analysis

In our work, an actor with 60 markers on his face was directed to do facial expressions. To solve the above mentioned problems, and in order to test the feasibility of the corresponding solving methods, we developed a facial animation system for passive optical MoCap.

To testify the robustness and efficiency of the system, several experiments about expression reconstruction have been done. The experimental detail will be shown in the following paragraphs. The testing platform is compatible PC machine whose processor is Pentium(R) Dual-Core CPU E5400, memory is 2G and operating system is Windows 7.

6.1 Experiment Setting

This paper is aimed to reconstruct facial animation using captured motion data from the passive optical MoCap system: DVMC-8820 [18], which is composed of eight infra-red (IR) cameras with four million pixels, and with 60 Hz capture rate. In experiment, 60 infra-red sensor markers are pasted in the face of performers. And in performance, the movement of head is limited in a small range. In basic, rotation angle is less than 5 degree and global shifting is less than 1/20 of the head, which is shown in Fig. 5.

Fig. 5.
figure 5

Illustration of facial Mo-Cap environment and facial marker setup

6.2 Results and Analysis

In order to validate the feasibility of facial expression reconstruction, based on VC++ platform and OpenGL graphic library, we have done some experiments, and the results are shown as follows.

6.2.1 Validation for the Facial Animation Reconstruction with Model Divisions

For validating the effect of facial animation, we have done some experiments, the result of which is shown in Fig. 6.

Fig. 6.
figure 6

The facial animation from two models

In experiment, we reconstructed facial animation by deploying the motion data captured from one person. Seen from Fig. 6, the reconstructed expression is real in some degree. However, in some cases, there often exist some unnatural expressions, especially when the mouth moves frequently or actively.

By animating a face with these visually important facial expressions, the resulted facial animations are much more realistic, life-like and expressive. Moreover, the MoCap data can be applied to different facial models. Hence, when the facial animation data recorded from a live subject’s performance can be reused, plenty of animator’s work is reduced.

6.2.2 Validation for Facial Animation Without Divisions on Model

In order to validate the effect of facial animation reconstruction without model divisions, we apply local muscle motion mechanism based on one-ring neighbors to adjust the deformed motion data after RBF mapping, the result is shown in Fig. 7.

Fig. 7.
figure 7

Frames of facial motion reconstruction without model divisions

Seen from Fig. 7, the reconstructed facial animation solves the discontinuity of the mesh in some degree. In this way, we can avoid dividing facial model into different regions. However, some detail expressions are missed.

7 Conclusions

In this work, we present a facial animation system that generates expressive facial gestures. And a facial animation method with model divisions is proposed. Besides dividing facial model into four regions to solve the discontinuity of the mesh, the RBFs are also used to map the MoCap data and mesh points. Except that, instead of dividing facial model into different regions, we also represent a facial animation reconstruction method without model divisions, which first finish the mapping through RBF, then adjust deformed motion data based on one-ring neighbors using muscle motion mechanism. The resulted facial animations are life-like and expressive.

Based on facial motion data having been tracked, a method of facial animation with model division is applied to reconstruct life-like animation. And the success of our method is mainly due to the fact that it not only makes use of the total MoCap data but also exploits rich correlations enclosed in facial MoCap data.

Based on muscle motion mechanism, a method of facial animation without model divisions is applied to reconstruct facial expressions. This method automatically avoids the divisions of model mesh. Furthermore, the adjusting method deploys the muscle motion mechanism based on one-ring neighbors. In this way, we get a fair result of facial animation.

Using the methods above, we realize the whole process of facial expression reconstruction, which has a good efficiency and meets with the request of naturalness for facial expressions. However, for facial animation with model divisions, each model requires a set of masks to be defined in order to enforce the discontinuous nature of the face. For facial animation without model division method, some details are missed, so the subtle expression cannot be obtained. As a result, the key point of our future work is to design a robust method which can be applied to reconstruct facial animation without model divisions.