HairNet: Single-View Hair Reconstruction Using Convolutional Neural Networks

Zhou, Yi; Hu, Liwen; Xing, Jun; Chen, Weikai; Kung, Han-Wei; Tong, Xin; Li, Hao

doi:10.1007/978-3-030-01252-6_15

HairNet: Single-View Hair Reconstruction Using Convolutional Neural Networks

Yi Zhou¹⁷,
Liwen Hu¹⁷,
Jun Xing¹⁸,
Weikai Chen¹⁸,
Han-Wei Kung¹⁹,
Xin Tong²⁰ &
…
Hao Li^17,18,19

Conference paper
First Online: 06 October 2018

2853 Accesses
36 Citations
9 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11215))

Abstract

We introduce a deep learning-based method to generate full 3D hair geometry from an unconstrained image. Our method can recover local strand details and has real-time performance. State-of-the-art hair modeling techniques rely on large hairstyle collections for nearest neighbor retrieval and then perform ad-hoc refinement. Our deep learning approach, in contrast, is highly efficient in storage and can run 1000 times faster while generating hair with 30K strands. The convolutional neural network takes the 2D orientation field of a hair image as input and generates strand features that are evenly distributed on the parameterized 2D scalp. We introduce a collision loss to synthesize more plausible hairstyles, and the visibility of each strand is also used as a weight term to improve the reconstruction accuracy. The encoder-decoder architecture of our network naturally provides a compact and continuous representation for hairstyles, which allows us to interpolate naturally between hairstyles. We use a large set of rendered synthetic hair models to train our network. Our method scales to real images because an intermediate 2D orientation field, automatically calculated from the real image, factors out the difference between synthetic and real hairs. We demonstrate the effectiveness and robustness of our method on a wide range of challenging real Internet pictures, and show reconstructed hair sequences from videos.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Realistic hair modeling is one of the most difficult tasks when digitizing virtual humans [3, 14, 20, 25, 27]. In contrast to objects that are easily parameterizable, like the human face, hair spans a wide range of shape variations and can be highly complex due to its volumetric structure and level of deformability in each strand. Although [2, 22, 26, 28, 38] can create high-quality 3D hair models, but they require specialized hardware setups that are difficult to be deployed and populated. Chai et al. [5, 6] introduced the first simple hair modeling technique from a single image, but the process requires manual input and cannot properly generate non-visible parts of the hair. Hu et al. [18] later addressed this problem by introducing a data-driven approach, but some user strokes were still required. More recently, Chai et al. [4] adopted a convolutional neural network to segment the hair in the input image to fully automate the modeling process, and [41] proposed a four-view approach for more flexible control.

However, these data-driven techniques rely on storing and querying a huge hair model dataset and performing computationally-heavy refinement steps. Thus, they are not feasible for applications that require real-time performance or have limited hard disk and memory space. More importantly, these methods reconstruct the target hairstyle by fitting the retrieved hair models to the input image, which may capture the main hair shape well, but cannot handle the details nor achieve high accuracy. Moreover, since both the query and refinement of hair models are based on an undirected 2D orientation match, where a horizontal orientation tensor can either direct to the right or the left, this method may sometimes produce hair with incorrect growing direction or parting lines and weird deformations in the z-axis.

To speed up the procedure and reconstruct hairs that preserve better style w.r.t the input image and look more natural, we propose a deep learning based approach to generate the full hair geometry from a single-view image, as shown in Fig. 1. Different from recent advances that synthesize shapes in the form of volumetric grids [8] or point clouds [10] via neural networks, our method generates the hair strands directly, which are more suitable for non-manifold structures like hair and could achieve much higher details and precision.

Our neural network, which we call HairNet, is composed of a convolutional encoder that extracts the high-level hair-feature vector from the 2D orientation field of a hair image, and a deconvolutional decoder that generates $32\,\times \,32$ strand-features evenly distributed on the parameterized 2D scalp. The hair strand-features could be interpolated on the scalp space to get higher (30K) resolution and further decoded to the final strands, represented as sequences of 3D points. In particular, the hair-feature vector can be seen as a compact and continuous representation of the hair model, which enables us to sample or interpolate more plausible hairstyles efficiently in the latent space. In addition to the reconstruction loss, we also introduce a collision loss between the hair strands and a body model to push the generated hairstyles towards a more plausible space. To further improve the accuracy, we uses the visibility of each strand based on the input image as a weight to modulate its loss.

Obtaining a training set with real hair images and ground-truth 3D hair geometries is challenging. We can factor out the difference between synthetic and real hair data by using an intermediate 2D orientation field as network input. This enables our network to be trained with largely accessible synthetic hair models and also real images without any changes. For example, the 2D orientation field can be calculated from a real image by applying a Gabor filter on the hair region automatically segmented using the method of [42]. Specifically, we synthesized a hair data set composed of 40K different hairstyles and 160K corresponding 2D orientation images rendered from random views for training.

Compared to previous data-driven methods that could take minutes and terabytes of disk storage for a single reconstruction, our method only takes less than 1 second and 70 MB disk storage in total. We demonstrate the effectiveness and robustness of our method on both synthetic hair images and real images from the Internet, and show applications in hair interpolation and video tracking.

Our contributions can be summarized as follows:

1.
We propose the first deep neural network to generate dense hair geometry from a single-view image. To the best of our knowledge, it is also the first work to incorporate both collision and visibility in a deep neural network to deal with 3D geometries.
2.
Our approach achieves state-of-the-art resolution and quality, and significantly outperforms existing data-driven methods in both speed and storage.
3.
Our network provides the first compact and continuous representation of hair geometry, from which different hairstyles can be smoothly sampled and interpolated.
4.
We construct a large-scale database of around 40K 3D hair models and 160K corresponding rendered images.

2 Related Work

Hair Digitization. A general survey of existing hair modeling techniques can be found in Ward et al. [36]. For experienced artists, purely manual editing from scratch with commercial softwares such as XGen and Hairfarm is chosen for highest quality, flexibility and controllability, but the modeling of compelling and realistic hairstyles can easily take several weeks. To avoid tedious manipulations on individual hair fibers, some efficient design tools are proposed in [7, 11, 23, 37, 40].

Meanwhile, hair capturing methods have been introduced to acquire hairstyle data from the real world. Most hair capturing methods typically rely on high-fidelity acquisition systems, controlled recording sessions, manual assistance such as multi-view stereo cameras [2, 9, 17, 22, 26, 28, 38], single RGB-D camera [19] or thermal imaging [16].

More recently, Single-view hair digitization methods have been proposed by Chai et al. [5, 6] but can only roughly produce the frontal geometry of the hair. Hu et al. [18] later demonstrated the first system that can model entire hairstyles at the strand level using a database-driven reconstruction technique with minimal user interactions from a single input image. A follow-up automatic method has been later proposed by [4], which uses a deep neural network for hair segmentation and augments a larger database for shape retrieval. To allow more flexible control of side and back views of the hairstyle, Zhang et al. [41] proposed a four-view image-based hair modeling method to fill the gap between multi-view and single-view hair capturing techniques. Since these methods rely on a large dataset for matching, speed is an issue and the final results depend highly on the database quality and diversity.

Single-View Reconstruction using Deep Learning. Generation of 3D data by deep neural networks has been attracting increasing attention recently. Volumetric CNNs [8, 12, 21, 33] use 3D convolutional neural networks to generate voxelized shapes but are highly constrained by the volume resolution and computation cost of 3D convolution. Although techniques such as hierarchical reconstruction [15] and octree [31, 32, 35] could be used to improve the resolution, generating details like hair strands are still extremely challenging.

On the other hand, point clouds scale well to high resolution due to their unstructured representation. [29, 30] proposed unified frameworks to learn features from point clouds for tasks like 3D object classification and segmentation, but not generation. Following the pioneering work of PointNet, [13] proposed the PCPNet to estimate the local normal and curvature from point sets, and [10] proposed a network for point set generation from a single image. However, point clouds still exhibit coarse structure and are not able to capture the topological structure of hair strands.

3 Method

The entire pipeline contains three steps. A preprocessing step is first adopted to calculate the 2D orientation field of the hair region based on the automatically estimated hair mask. Then, HairNet takes the 2D orientation fields as input and generates hair strands represented as sequences of 3D points. A reconstruction step is finally performed to efficiently generate a smooth and dense hair model.

3.1 Preprocessing

We first adopt PSPNet [42] to produce an accurate and robust pixel-wise hair mask of the input portrait image, followed by computing the undirected 2D orientation for each pixel of the hair region using a Gabor filter [26]. The use of undirected orientation eliminates the need of estimating the hair growth direction, which otherwise requires extra manual labeling [18] or learning [4]. However, the hair alone could be ambiguous due to the lack of camera view information and its scale and position with respect to the human body. Thus we also add the segmentation mask of the human head and body on the input image. In particular, the human head is obtained by fitting a 3D morphable head model to the face [20] and the body could be positioned accordingly via rigid transformation. All these processes could be automated and run in real-time. The final output is a $3\,\times \,256\times 256$ image, whose first two channels store the color-coded hair orientation and third channel indicates the segmentation of hair, body and background.

3.2 Data Generation

Similar to Hu et al. [18], we first collect an original hair dataset with 340 3D hair models from public online repositories [1], align them to the same reference head, convert the mesh into hair strands and solve the collision between the hair and the body. We then populate the original hair set via mirroring and pair-wise blending.

Different from AutoHair [4] which simply uses volume boundaries to avoid unnatural combinations, we separate the hairs into 12 classes based on styles shown in Table 1 and blend each pair of hairstyles within the same class to generate more natural examples. In particular, we cluster the strands of each hair into five central strands, and each pair of hairstyles can generate $2^5-2$ additional combinations of central strands. The new central strands serve as a guidance to blend the detailed hairs. Instead of using all of the combinations, we randomly select the combination of them for each hair pair, leading to a total number over 40K hairs for our synthetic hair dataset.

Table 1. Hair classes and the number of hairs in each class. S refers to short, M refers to medium, L refers to long, X refers to very, s refers to straight and c refers to curly. Some hairs are assigned to multiple classes if its style is ambiguous.

Full size table

In order to get the corresponding orientation images of each hair model, we randomly rotate and translate hair inside the view port of a fixed camera and render 4 orientation images at different views. The rotation ranges from ${-}90^\circ $ to +90$^\circ $ for the yaw axis and ${-}15^\circ $ to +15$^\circ $ for the pitch and roll axis. We also add Gaussian noises to the orientation to emulate the real conditions.

3.3 Hair Prediction Network

Hair Representation. We represent each strand as an ordered 3D point set $\zeta =\{s_{i}\}^{M}_{i=0}$, evenly sampled with a fixed number (M = 100 in our experiments) of points from the root to end. Each sample $s_{i}$ contains attributes of position $\mathbf {p}_{i}$ and curvature $c_{i}$. Although the strands have large variance in length, curliness, and shape, they all grow from fixed roots to flexible ends. To remove the variance caused by root positions, we represent each strand in the local coordinate anchored at its root.

The hair model can be treated as a set of N strands $H=\zeta ^N$ with fixed roots, and can be formulated as a matrix $A_{N*M}$, where each entry $A_{i,j} = (\mathbf {p}_{i,j}, c_{i,j})$ represents the jth sample point on the ith strand. In particular, we adopt the method in [34] to parameterize the scalp to a $32\,\times \,32$ grid, and sample hair roots at those grid centers (N = 1024).

Network Architecture. As illustrated in Fig. 2, our network first encodes the input image to a latent vector, followed by decoding the target hair strands from the vector. For the encoder, we use the convolutional layers to extract the high-level features of the image. Different from the common practices that use a fully-connected layer as the last layer, we use the 2D max-pooling to spatially aggregate the partial features (a total number of $8\,\times \,8$) into a global feature vector z. This greatly reduces the number of network parameters.

The decoder generates the hair strands in two steps. The hair feature vector z is first decoded into multiple strand feature vectors $\{z_i\}^{M}_{i=0}$ via deconvolutional layers, and each $z_i$ could be further decoded into the final strand geometry $\zeta $ via the same multi-layer fully connected network. This multi-scale decoding mechanism allows us to efficiently produce denser hair models by interpolating the strand features. According to our experiments, this achieves a more natural appearance than directly interpolating final strand geometry.

It is widely observed that generative neural networks often lose high frequency details, as the low frequency components often dominates the loss in training. Thus, apart from the 3D position $\{\mathbf {p}_{i}\}$ of each strand, our strand decoder also predicts the curvatures $\{c_{i}\}$ of all samples. With the curvature information, we can reconstruct the high frequency strand details.

Loss Functions. We apply three losses on our network. The first two losses are the $L_2$ reconstruction loss of the 3D position and the curvature of each sample. The third one is the collision loss between the output hair strand and the human body. To speed up the collision computation, we approximate the geometry of the body with four ellipsoids as shown in Fig. 3.

Given a single-view image, the shape of the visible part of the hair is more reliable than the invisible part, e.g. the inner and back hair. Thus we assign adaptive weights to the samples based on their visibility—visible samples will have higher weights than the invisible ones.

The final loss function is given by:

$$\begin{aligned} L = L_{pos} + \lambda _{1}L_{curv} + \lambda _{2}L_{collision}. \end{aligned}$$

(1)

$L_{pos}$ and $L_{curv}$ are the loss of the 3D positions and the curvatures respectively, written as:

$$\begin{aligned} \begin{aligned} L_{pos} = \frac{1}{NM}\sum _{i=0}^{N-1}\sum _{j=0}^{M-1} w_{i,j}||\mathbf {p}_{i,j}-\mathbf {p}_{i,j}^*||_2^2\\ L_{curv} = \frac{1}{NM}\sum _{i=0}^{N-1}\sum _{j=0}^{M-1} w_{i,j}(c_{i,j}-c_{i,j}^*)^2\\ w_{i,j} = {\left\{ \begin{array}{ll} 10.0 &{} s_{i,j} \ is \ visible \\ 0.1 &{} \mathrm {otherwise} \end{array}\right. } \end{aligned} \end{aligned}$$

(2)

where $\mathbf {p}_{i,j}^*$ and $c_{i,j}^*$ are the corresponding ground truth position and curvature to $\mathbf {p}_{i,j}$ and $c_{i,j}$, and $w_{i,j}$ is the visibility weight.

The collision loss $L_{col}$ is written as the sum of each collision error on the four ellipsoids:

$$\begin{aligned} L_{col} = \frac{1}{NM}\sum _{k=0}^3 C_k \end{aligned}$$

(3)

Each collision error is calculated as the sum of the distance of each collided point to the ellipsoid surface weighted by the length of strand that is inside the ellipsoid, written

$$\begin{aligned} C_{k}= & {} \sum _{i=0}^{N-1}\sum _{j=1}^{M-1} \Vert \mathbf {p}_{i,j} - \mathbf {p}_{i, j-1}\Vert max(0, Dist_k) \end{aligned}$$

(4)

$$\begin{aligned} Dist_k= & {} 1-\frac{(\mathbf {p}_{i,j,0}-x_k)^2}{a_k^2}-\frac{(\mathbf {p}_{i,j,1}-y_k)^2}{b_k^2}-\frac{(\mathbf {p}_{i,j,2}-z_k)^2}{c_k^2} \end{aligned}$$

(5)

where $\Vert \mathbf {p}_{i,j} - \mathbf {p}_{i, j-1}\Vert $ is the $L_1$ distance between two adjacent samples on the strand. $x_k$, $y_k$, $z_k$, $a_k$, $b_k$, and $d_k$ are the model parameters of the ellipsoid.

Training Details. The training parameters of Eq. 1 are fixed to be $\lambda _{1}=1.0$ and $\lambda _{2}=10^{-4}$. During training, we resize all the hair so that the hair is measured in the metric system. We use Relu for nonlinear activation, Adam [24] for optimization, and run the training for 500 epochs using a batch size of 32 and learning rate of $10^{-4}$ divided by 2 after 250 epochs.

3.4 Reconstruction

The output strands from the network may contain noise, and sometimes lose high-frequency details when the target hair is curly. Thus, we further refine the smoothness and curliness of the hair. We first smooth the hair strands by using a Gaussian filter to remove the noise. Then, we compare the difference between the predicted curvatures and the curvatures of the output strands. If the difference is higher than a threshold, we add offsets to the strands samples. In particular, we first construct a local coordinate frame at each sample with one axis along the tangent of the strand, then apply an offset function along the other two axises by applying the curve generation function described in the work of Zhou et al. [39].

The network only generates 1K hair strands, which is insufficient to render a high fidelity output. To obtain higher resolution, traditional methods build a 3D direction field from the guide strands and regrows strands using the direction field from a dense set of follicles. However, this method is time consuming and cannot be used to reconstruct an accurate hair model. Although directly interpolating the hair strands is fast, it can also produce an unnatural appearance. Instead, we bilinearly interpolate the intermediate strand features $z_i$ generated by our network and decode them to strands by using the perceptron network, which enables us to create hair models with arbitrary resolution.

Figure 5 demonstrates that by interpolating in strand-feature space, we can generate a more plausible hair model. In contrast, direct interpolation of the final strands could lead to artifacts like collisions. This is easy to understand, as the strand-feature could be seen as a non-linear mapping of the strand, and could fall in a more plausible space.

Figure 6 demonstrates the effectiveness of adding curliness in our network. Without using the curliness as an extra constraint, the network only learns the dominant main growing direction while losing the high-frequency details. In this paper, we demonstrate all our results at a resolution of 9K to 30K strands.

4 Evaluation

4.1 Quantitative Results and Ablation Study

In order to quantitatively estimate the accuracy of our method, we prepare a synthetic test set with 100 random hair models and 4 images rendered from random views for each hair model. We compute the reconstruction errors on both the visible and invisible part of the hair separately using the mean square distance between points and the collision error using Eq. 3. We compare our result with Chai et al.’s method [4]. Their method first queries for the nearest neighbor in the database and then performs a refinement process which globally deforms the hair using the 2D boundary constraints and the 2D orientation constraints based on the input image. To ensure the fairness and efficiency of the comparison, we use the same database in our training set for the nearest neighbor query of [4] based on the visible part of the hair, and set the resolution at 1000 strands. We also compare with Hu et al.’s method [18] which requires manual strokes for generating the 3D hair model. But drawing strokes for the whole test set is too laborious, so in our test, we use three synthetic strokes randomly sampled from the ground-truth model as input. In Table 2, we show the error comparison with the nearest neighbor query results and the methods of both papers. We also perform an ablation test by respectively eliminating the visibility-adaptive weights, the collision loss and the curvature loss from our network.

From the experiments, we observe that our method outperforms all the ablation methods and Chai et al.’s method. Without the visibility-adaptive weights, the reconstruction error is about the same for both the visible and invisible parts, while the reconstruction error of the visible hair decreased by around 30% for all the networks that applies the visibility-adaptive weights. The curvature loss also helps decrease the mean square distance error of the reconstruction. The experiment also shows that using the collision loss will lead to much less error in collision. The nearest-neighbor method results have 0 collision error because the hairs in the database have no collisions.

In Table 3, we compare the computation time and hard disk usage of our method and the data-driven method at the resolution of 9K strands. It can be seen that our method can be about three magnitude faster faster and only uses a small amount of storage space. The reconstruction time differs from straight hair styles and curly hair styles because for straight hair styles which have less curvature difference, we skip the process of adding curves.

Table 2. Reconstruction Error Comparison. The errors are measured in metric. The Pos Error refers to the mean square distance error between the ground-truth and the predicted hair. “-VAW” refers to eliminating the visibility-adaptive weights. “-Col” refers to eliminating the collision loss, “-Curv” refers to eliminating the curvature loss. “NN” refers to nearest neighbor query based on the visible part of the hair.

Full size table

Table 3. Time and space complexity.

Full size table

4.2 Qualitative Results

To demonstrate the generality of our method, we tested with different real portrait photographs as input, as shown in the supplementary materials. Our method can handle different overall shapes (e.g. short hairstyles and long hairstyles). In addition, our method can also reconstruct different levels curliness within hairstyles (e.g. straight, wavy, and very curly) efficiently, since we learn the curliness as curvatures in the network and use it to synthesize our final strands.

In Figs. 8 and 9, we compare our results of single-view hair reconstruction with autohair [4]. We found that both methods can make rational inference of the overall hair geometry in terms of length and shape, but the hair from our method can preserve better local details and looks more natural, especially for curly hairs. This is because Chai et al.’s method depends on the accuracy and precision of the orientation field generated from the input image, but the orientation field generated from many curly hair images is noisy and the wisps overlap with each other. In addition, they use helix fitting to infer the depth of the hair, but it may fail for very curly hairs, as shown in the second row of Fig. 8. Moreover, Chai et al.’s method can only refine the visible part of the hair, so the reconstructed hair may look unnatural from views other than the view of the input image, while the hair reconstructed with our method looks comparatively more coherent from additional views.

Figure 7 shows the interpolation results of our method. The interpolation is performed between four different hair styles and the result shows that our method can smoothly interpolate hair between curly or straight and short or long hairs. We also compare interpolation with Weng et al.’s method [37]. In Fig. 7, Weng et al.’s method produces a lot of artifacts while our method generates more natural and smooth results. The interpolation results indicate the effectiveness of our latent hair representation. Please refer to the supplemental materials for more interpolation results.

We also show video tracking results (see Fig. 10 and supplemental video). It shows that our output may fail to achieve sufficient temporal coherence.

5 Conclusion

We have demonstrated the first deep convolutional neural network capable of performing real-time hair generation from a single-view image. By training an end-to-end network to directly generate the final hair strands, our method can capture more hair details and achieve higher accuracy than current state-of-the-art. The intermediate 2D orientation field as our network input provides flexibility, which enables our network to be used for various types of hair representations, such as images, sketches and scans given proper preprocessing. By adopting a multi-scale decoding mechanism, our network could generate hairstyles of arbitrary resolution while maintaining a natural appearance. Thanks to the encoder-decoder architecture, our network provides a continuous hair representation, from which plausible hairstyles could be smoothly sampled and interpolated.

6 Limitations and Future Work

We found that our approach fails to generate exotic hairstyles like kinky, afro or buzz cuts as shown in Fig. 11. We think the main reason is that we do not have such hairstyles in our training database. Building a large hair dataset that covers more variations could mitigate this problem. Our method would also fail when the hair is partially occluded. Thus we plan to enhance our training in the future by adding random occlusions. In addition, we use face detection to estimate the pose of the torso in this paper, but it can be replaced by using deep learning to segment the head and body. Currently, the generated hair model is insufficiently temporally coherent for video frames. Integrating temporal smoothness as a constraint for training is also an interesting future direction. Although our network provides a more compact representation for the hair, there is no semantic meaning of such latent representation. It would be interesting to concatenate explicit labels (e.g. color) to the latent variable for controlled training.

References

Electronic Arts: The Sims Resource (2017). http://www.thesimsresource.com/
Beeler, T., et al.: Coupled 3d reconstruction of sparse facial hair and skin. ACM Trans. Graph. 31, 117:1–117:10 (2012). https://doi.org/10.1145/2185520.2185613. http://graphics.ethz.ch/publications/papers/paperBee12.php
Article Google Scholar
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107(2), 177–190 (2014)
Article MathSciNet Google Scholar
Chai, M., Shao, T., Wu, H., Weng, Y., Zhou, K.: Autohair: fully automatic hair modeling from a single image. ACM Trans. Graph. (TOG) 35(4), 116 (2016)
Article Google Scholar
Chai, M., Wang, L., Weng, Y., Jin, X., Zhou, K.: Dynamic hair manipulation in images and videos. ACM Trans. Graph. 32(4), 75:1–75:8 (2013). https://doi.org/10.1145/2461912.2461990
Article Google Scholar
Chai, M., Wang, L., Weng, Y., Yu, Y., Guo, B., Zhou, K.: Single-view hair modeling for portrait manipulation. ACM Trans. Graph. 31(4), 116:1–116:8 (2012). https://doi.org/10.1145/2185520.2185612
Article Google Scholar
Choe, B., Ko, H.: A statistical wisp model and pseudophysical approaches for interactivehairstyle generation. IEEE Trans. Vis. Comput. Graph. 11(2), 160–170 (2005). https://doi.org/10.1109/TVCG.2005.20
Article Google Scholar
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D–R2N2: a unified approach for single and multi-view 3D object reconstruction. CoRR abs/1604.00449 (2016). http://arxiv.org/abs/1604.00449
Echevarria, J.I., Bradley, D., Gutierrez, D., Beeler, T.: Capturing and stylizing hair for 3d fabrication. ACM Trans. Graph. 33(4), 125:1–125:11 (2014). https://doi.org/10.1145/2601097.2601133
Article Google Scholar
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. CoRR abs/1612.00603 (2016). http://arxiv.org/abs/1612.00603
Fu, H., Wei, Y., Tai, C.L., Quan, L.: Sketching hairstyles. In: Proceedings of the 4th Eurographics Workshop on Sketch-Based Interfaces and Modeling, SBIM 2007, pp. 31–36. ACM, New York (2007). https://doi.org/10.1145/1384429.1384439
Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. CoRR abs/1603.08637 (2016). http://arxiv.org/abs/1603.08637
Guerrero, P., Kleiman, Y., Ovsjanikov, M., Mitra, N.J.: PCPNET: learning local shape properties from raw point clouds. Comput. Graph. Forum (Eurographics) 37, 75–85 (2017)
Article Google Scholar
Hadap, S., et al.: Strands and hair: modeling, animation, and rendering. In: ACM SIGGRAPH 2007 Courses, pp. 1–150. ACM (2007)
Google Scholar
Häne, C., Tulsiani, S., Malik, J.: Hierarchical surface prediction for 3d object reconstruction. CoRR abs/1704.00710 (2017). http://arxiv.org/abs/1704.00710
Herrera, T.L., Zinke, A., Weber, A.: Lighting hair from the inside: a thermal approach to hair reconstruction. ACM Trans. Graph. 31(6), 146:1–146:9 (2012). https://doi.org/10.1145/2366145.2366165
Article Google Scholar
Hu, L., Ma, C., Luo, L., Li, H.: Robust hair capture using simulated examples. ACM Trans. Graph. 33(4), 126 (2014). Proceedings SIGGRAPH 2014
Article Google Scholar
Hu, L., Ma, C., Luo, L., Li, H.: Single-view hair modeling using a hairstyle database. ACM Trans. Graph. (TOG) 34(4), 125 (2015)
Google Scholar
Hu, L., Ma, C., Luo, L., Wei, L.Y., Li, H.: Capturing braided hairstyles. ACM Trans. Graph. 33(6), 225 (2014). Proceedings SIGGRAPH Asia 2014
Google Scholar
Hu, L., et al.: Avatar digitization from a single image for real-time rendering. ACM Trans. Graph. (TOG) 36(6), 195 (2017)
Google Scholar
Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3d face reconstruction from a single image via direct volumetric CNN regression. In: International Conference on Computer Vision (2017)
Google Scholar
Jakob, W., Moon, J.T., Marschner, S.: Capturing hair assemblies fiber by fiber. ACM Trans. Graph. 28(5), 164:1–164:9 (2009). https://doi.org/10.1145/1618452.1618510
Article Google Scholar
Kim, T.Y., Neumann, U.: Interactive multiresolution hair modeling and editing. ACM Trans. Graph. 21(3), 620–629 (2002). https://doi.org/10.1145/566654.566627
Article Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, H., et al.: Facial performance sensing head-mounted display. ACM Trans. Graph. (TOG) 34(4), 47 (2015)
MathSciNet Google Scholar
Luo, L., Li, H., Rusinkiewicz, S.: Structure-aware hair capture. ACM Trans. Graph. 32(4), 76 (2013). Proceedings SIGGRAPH 2013
Article Google Scholar
Olszewski, K., Lim, J.J., Saito, S., Li, H.: High-fidelity facial and speech animation for VR HMDS. ACM Trans. Graph. 35(6), 221 (2016). Proceedings SIGGRAPH Asia 2016
Article Google Scholar
Paris, S., et al.: Hair photobooth: geometric and photometric acquisition of real hairstyles. ACM Trans. Graph. 27(3), 30:1–30:9 (2008). https://doi.org/10.1145/1360612.1360629
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv:1612.00593 (2016)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 3 (2017)
Google Scholar
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. CoRR, arXiv:abs/1703.09438 (2017)
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. CoRR abs/1704.06254 (2017). http://arxiv.org/abs/1704.06254
Wang, L., Yu, Y., Zhou, K., Guo, B.: Example-based hair geometry synthesis. ACM Trans. Graph. 28(3), 56:1–56:9 (2009)
Google Scholar
Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: Octree-based convolutional neural networks for 3d shape analysis. ACM Trans. Graph. 36(4), 72:1–72:11 (2017). https://doi.org/10.1145/3072959.3073608
Google Scholar
Ward, K., Bertails, F., Kim, T.Y., Marschner, S.R., Cani, M.P., Lin, M.C.: A survey on hair modeling: styling, simulation, and rendering. IEEE Trans. Vis. Comput. Graph. 13, 213–234 (2006)
Article Google Scholar
Weng, Y., Wang, L., Li, X., Chai, M., Zhou, K.: Hair interpolation for portrait morphing. Comput. Graph. Forum (2013). https://doi.org/10.1111/cgf.12214
Article Google Scholar
Xu, Z., Wu, H.T., Wang, L., Zheng, C., Tong, X., Qi, Y.: Dynamic hair capture using spacetime optimization. ACM Trans. Graph. 33(6), 224:1–224:11 (2014). https://doi.org/10.1145/2661229.2661284
Article Google Scholar
Yu, Y.: Modeling realistic virtual hairstyles. In: Proceedings of Ninth Pacific Conference on Computer Graphics and Applications, pp. 295–304. IEEE (2001)
Google Scholar
Yuksel, C., Schaefer, S., Keyser, J.: Hair meshes. ACM Trans. Graph. 28(5), 166:1–166:7 (2009). https://doi.org/10.1145/1618452.1618512
Article Google Scholar
Zhang, M., Chai, M., Wu, H., Yang, H., Zhou, K.: A data-driven approach to four-view image-based hair modeling. ACM Trans. Graph. 36(4), 156:1–156:11 (2017). https://doi.org/10.1145/3072959.3073627
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar

Download references

Acknowledgements

We thank Weiyue Wang, Haoqi Li, Sitao Xiang and Tianye Li for giving us valuable suggestions in designing the algorithms and writing the paper. This work was supported in part by the ONR YIP grant N00014-17-S-FO14, the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA, the Andrew and Erna Viterbi Early Career Chair, the U.S. Army Research Laboratory (ARL) under contract number W911NF-14-D-0005, Adobe, and Sony. The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

Author information

Authors and Affiliations

University of Southern California, Los Angeles, USA
Yi Zhou, Liwen Hu & Hao Li
USC Institute for Creative Technologies, Los Angeles, USA
Jun Xing, Weikai Chen & Hao Li
Pinscreen, Santa Monica, USA
Han-Wei Kung & Hao Li
Microsoft Research Asia, Beijing, China
Xin Tong

Authors

Yi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Liwen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Xing
View author publications
You can also search for this author in PubMed Google Scholar
Weikai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Han-Wei Kung
View author publications
You can also search for this author in PubMed Google Scholar
Xin Tong
View author publications
You can also search for this author in PubMed Google Scholar
Hao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zhou .

Editor information

Editors and Affiliations

Google Research, Zurich, Switzerland
Vittorio Ferrari
Carnegie Mellon University, Pittsburgh, PA, USA
Martial Hebert
Google Research, Zurich, Switzerland
Cristian Sminchisescu
Hebrew University of Jerusalem, Jerusalem, Israel
Yair Weiss

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 33449 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y. et al. (2018). HairNet: Single-View Hair Reconstruction Using Convolutional Neural Networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11215. Springer, Cham. https://doi.org/10.1007/978-3-030-01252-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-01252-6_15
Published: 06 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01251-9
Online ISBN: 978-3-030-01252-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

1 Introduction

2 Related Work

3 Method

3.1 Preprocessing

3.2 Data Generation

3.3 Hair Prediction Network

3.4 Reconstruction

4 Evaluation

4.1 Quantitative Results and Ablation Study

4.2 Qualitative Results

5 Conclusion

6 Limitations and Future Work

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (zip 33449 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation