Skip to main content

Speech Synchronized Tongue Animation by Combining Physiology Modeling and X-ray Image Fitting

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10132))

Included in the following conference series:

Abstract

This paper proposes a speech synchronized tongue animation system from text or speech. Firstly, an anatomically accurate physiological tongue model is built, and then produces tremendous tongue deformation samples according to the randomly input muscle activation samples. Secondly, these input and output samples are used to train a neural network for establishing the relationship between the muscle activation and tongue contour deformation. Thirdly, the neural network is used to estimate the non-rigid tongue movement parameters, namely tongue muscle activations, from a collected X-ray tongue movement image database of Mandarin Chinese phonemes after removing the rigid tongue movement, and then the estimation results are used for constructing the tongue physeme (the sequences of the tongue muscle activations and the rigid movement) database corresponding to the Mandarin Chinese phoneme database. Finally, the physemes corresponding to the phonemes extracted from input text or speech are blended to drive the physiological tongue model for producing the speech synchronized tongue animation according to the durations of phonemes. Simulation results demonstrate that the synthesized tongue animations are visually realistic and approximate the tongue medical data well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Parke, F.I.: Computer generated animation of faces. In: Proceedings ACM National Conference, pp. 451–457. ACM: New York (1972)

    Google Scholar 

  2. Waters, K.: A muscle model for animating three dimensional facial expression. In: Stone, M.C. (ed.) Computer Graphics, vol. 21, pp. 17–24. Anaheim, CA (1987)

    Google Scholar 

  3. Sanguineti, V., Laboissiere, R., Payan, Y.: A control model of human tongue movements in speech. Biol. Cybern. 77(1), 11–22 (1997)

    Article  MATH  Google Scholar 

  4. Fujita, S., Dang, J., Suzuki, N., et al.: A computational tongue model and its clinical application. Oral Sci. Int. 4(2), 97–109 (2007)

    Article  Google Scholar 

  5. Modeling coarticulation in synthetic visual speech

    Google Scholar 

  6. Badin, P., Bailly, G., et al.: Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. J. Phonetics 30(3), 533–553 (2002)

    Article  Google Scholar 

  7. Engwall, O.: A 3D tongue model based on MRI data. In: INTERSPEECH, pp. 901–904 (2000)

    Google Scholar 

  8. Wilhelms-Tricarico, R.: Physiological modeling of speech production: methods for modeling soft -tissue articulators. JASA 97(5), 3085–3098 (1995)

    Article  Google Scholar 

  9. King, S.A., Parent, R.E.: A 3D parametric tongue model for animated speech. J. Vis. Comput. Anim. 12(3), 107–115 (2001)

    Article  MATH  Google Scholar 

  10. Ilie, M.D., Negrescu, C., Stanomir, D.: An efficient parametric model for real-time 3D tongue skeletal animation. In: ICC, pp. 129–132 (2012)

    Google Scholar 

  11. Engwall, O., Combining, M.R.I.: EMA and EPG measurements in a three-dimensional tongue model. Speech Commun. 41(2), 303–329 (2003)

    Article  Google Scholar 

  12. Miyawaki, K.: A study of the musculature of the human tongue. Annu. Bull. Res. Inst. Logopedics Phoniatrics 8, 23–50 (1974)

    Google Scholar 

  13. Agur, A.M.R., et al.: Grant’s Atlas of Anatomy. Lippincott Williams & Wilkins, Baltimore (2009)

    Google Scholar 

  14. Mac Neilage, P.F., Sholes, G.N.: An electromyographic study of the tongue during vowel production. J. Speech Lang. Hear. Res. 7(3), 209–232 (1964)

    Article  Google Scholar 

  15. Shewchuk, J.R.: Constrained Delaunay Tetrahedronlizations and provably good boundary recovery. In: IMR, pp. 193–204 (2002)

    Google Scholar 

  16. Takemoto, H.: Morphological analyses of the human tongue musculature for three-dimensional modeling. JSLHR 44(1), 95–107 (2001)

    Google Scholar 

  17. Weiss, J.A., Maker, B.N., Govindjee, S.: Finite element implementation of incompressible, transversely isotropic hyperelasticity. CMAME 135(1), 107–128 (1996)

    MATH  Google Scholar 

  18. Sifakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion capture marker data. TOG ACM 24(3), 417–425 (2005)

    Article  Google Scholar 

  19. Simo, J.C., Taylor, R.L.: Quasi-incompressible finite elasticity in principal stretches. Continuum Basis Numer. Algorithms CMAME 85(3), 273–310 (1991)

    MATH  Google Scholar 

  20. Tang, C.Y., et al.: A 3D skeletal muscle model coupled with active contraction of muscle fibres and hyperelastic behaviour. J. Biomech. 42(7), 865–872 (2009)

    Article  Google Scholar 

  21. Baer, T., Alfonso, P.J., Honda, K.: Electromyography of the tongue muscles during vowels in /gpvp/ environment. Ann Bull RILP 22, 7–19 (1988)

    Google Scholar 

  22. Agur A M R, et al., Grant’s atlas of anatomy. Lippincott Williams & Wilkins, 2009

    Google Scholar 

  23. Cootes, T.F., et al.: Active appearance models. TPAMI 23(6), 681–685 (2001)

    Article  Google Scholar 

  24. Laprie, Y., Berger, M.O.: Extraction of tongue contours in x-ray images with minimal user interaction. ICSLP 1, 268–271 (1996)

    Google Scholar 

  25. Deng, Z., Chiang, P.Y., Fox, P. et al.: Animating blendshape faces by cross-mapping motion capture data. Interactive 3D graphics and games, pp. 43–48. ACM (2006)

    Google Scholar 

  26. Sock, R., Hirsch, F., Laprie, Y. et al.: An X-ray database, tools and procedures for the study of speech production. In: ISSP, pp. 41–48 (2011)

    Google Scholar 

  27. Yu, J., Li, A.: 3D visual pronunciation of Mandarine Chinese for language learning. In: IEEE International Conference on Image Processing, pp. 2036–2040 (2014)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61572450, No. 61303150), the Open Project Program of the State KeyLab of CAD&CG, Zhejiang University (No. A1501), the Fundamental Research Funds for the Central Universities (WK2350000002), the Open Funding Project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No. BUAA-VR-16KF-12), the Open Funding Project of State Key Laboratory of Novel Software Technology, Nanjing University (No. KFKT2016B08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yu, J. (2017). Speech Synchronized Tongue Animation by Combining Physiology Modeling and X-ray Image Fitting. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51811-4_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51810-7

  • Online ISBN: 978-3-319-51811-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics