DNN-Based Talking Movie Generation with Face Direction Consideration

Ishikawa, Toru; Nose, Takashi; Ito, Akinori

doi:10.1007/978-3-030-03748-2_19

DNN-Based Talking Movie Generation with Face Direction Consideration

Toru Ishikawa⁷,
Takashi Nose⁷ &
Akinori Ito⁷

Conference paper
First Online: 11 November 2018

531 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 110))

Abstract

In this paper, we propose a method to generate a talking head animation considering the direction of the face. The proposed method parametrizes a facial image using the active appearance model (AAM) and models the parameters of the AAM using a feedforward deep neural network. Since the AAM is a two-dimensional face model, conventional methods that use the AAM assumes only the frontal face. Thus, when combining the generated face and other parts such as a head and a body, the direction of the face and the head was often inconsistent. The proposed method models the shape parameters of the AAM using the principal component analysis (PCA) so that the direction and movement of individual facial parts are modeled separately; thus we substitute the face direction of the generated animation with that of the head part so that the direction of the face and the head coincides. We conducted an experiment to demonstrate that the proposed method can generate face animation with proper face direction.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Anderson, R., Stenger, B., Wan, V., Cipolla, R.: Expressive visual text-to-speech using active appearance models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3382–3389 (2013)
Google Scholar
Baltrušaitis, T., Robinson, P., Morency, L.P.: Openface: an open source facial behavior analysis toolkit. In: IEEE Winter Conference on Applications of Computer Vision (2016)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: European Conference on Computer Vision, pp. 484–498 (1998)
Google Scholar
Fan, B., Wang, L., Soong, F.K., Xie, L.: Photo-real talking head with deep bidirectional LSTM. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4884–4888. IEEE (2015)
Google Scholar
Fan, B., Xie, L., Yang, S., Wang, L., Soong, F.K.: A deep bidirectional LSTM approach for video-realistic talking head. Multimed. Tools Appl. 75(9), 5287–5309 (2016)
Article Google Scholar
Ishi, C.T., Ishiguro, H., Hagita, N.: Analysis of relationship between head motion events and speech in dialogue conversations. Speech Commun. 57, 233–243 (2014). https://doi.org/10.1016/j.specom.2013.06.008
Article Google Scholar
Ling, Z.H., Kang, S.Y., Zen, H., Senior, A., Schuster, M., Qian, X.J., Meng, H.M., Deng, L.: Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends. IEEE Signal Process. Mag. 32(3), 35–52 (2015)
Article Google Scholar
Mattheyses, W., Verhelst, W.: Audiovisual speech synthesis: an overview of the state-of-the-art. Speech Commun. 66, 182–217 (2015)
Article Google Scholar
Ostermann, J., Weissenfeld, A.: Talking faces—technologies and applications. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR), vol. 3, pp. 826–833 (2004)
Google Scholar
Parker, J., Maia, R., Stylianou, Y., Cipolla, R.: Expressive visual text to speech and expression adaptation using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4920–4924. IEEE (2017)
Google Scholar
Saito, Y., Nose, T., Shinozaki, T., Ito, A.: Conversion of speaker’s face image using PCA and animation unit for video chatting. In: Proceedings of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp. 433–436 (2015)
Google Scholar
Sato, K., Nose, T., Ito, A.: HMM-based photo-realistic talking face synthesis using facial expression parameter mapping with deep neural networks. J. Comput. Commun. 5(10), 50 (2017)
Article Google Scholar
Wu, Y.J., Wang, R.H.: Minimum generation error training for HMM-based speech synthesis. In: Proceedings of ICASSP, pp. 889–892 (2006)
Google Scholar
Xie, L., Sun, N., Fan, B.: A statistical parametric approach to video-realistic text-driven talking avatar. 73(1), 377–396 (2014). https://doi.org/10.1007/s11042-013-1633-3. https://link.springer.com/journal/11042
Article Google Scholar
Zen, H., Tokuda, K., Black, A.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Article Google Scholar

Download references

Acknowledgment

Part of this work was supported by JSPS KAKENHI Grant Number JP17H00823.

Author information

Authors and Affiliations

Graduate School of Engineering, Tohoku University, Sendai, Miyagi, 980-8579, Japan
Toru Ishikawa, Takashi Nose & Akinori Ito

Authors

Toru Ishikawa
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Nose
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Ito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akinori Ito .

Editor information

Editors and Affiliations

College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, China
Jeng-Shyang Pan
Graduate School of Engineering, Tohoku University, Sendai, Miyagi, Japan
Akinori Ito
Swinburne University of Technology, Hawthorn, VIC, Australia
Pei-Wei Tsai
Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishikawa, T., Nose, T., Ito, A. (2019). DNN-Based Talking Movie Generation with Face Direction Consideration. In: Pan, JS., Ito, A., Tsai, PW., Jain, L. (eds) Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2018. Smart Innovation, Systems and Technologies, vol 110. Springer, Cham. https://doi.org/10.1007/978-3-030-03748-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-03748-2_19
Published: 11 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03747-5
Online ISBN: 978-3-030-03748-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics