Tongue Mesh Extraction from 3D MRI Data of the Human Vocal Tract

  • Alexander HewerEmail author
  • Stefanie Wuhrer
  • Ingmar Steiner
  • Korin Richmond
Conference paper
Part of the Mathematics and Visualization book series (MATHVISUAL)


In speech science, analyzing the shape of the tongue during human speech production is of great importance. In this field, magnetic resonance imaging (MRI) is currently regarded as the preferred modality for acquiring dense 3D information about the human vocal tract . However, the desired shape information is not directly available from the acquired MRI data. In this chapter, we present a minimally supervised framework for extracting the tongue shape from a 3D MRI scan. It combines an image segmentation approach with a template fitting technique and produces a polygon mesh representation of the identified tongue shape. In our evaluation, we focus on two aspects: First, we investigate whether the approach can be regarded as independent of changes in tongue shape caused by different speakers and phones. Moreover, we check whether an average user who is not necessarily an anatomical expert may obtain acceptable results. In both cases, our framework shows promising results.


Point Cloud Vocal Tract Magnetic Resonance Imaging Data Hard Palate Polygon Mesh 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This study uses data from work supported by EPSRC Healthcare Partnerships Grant number EP/I027696/1 (“Ultrax”).


  1. 1.
    Ultrax: Real-time tongue tracking for speech therapy using ultrasound (2014). Accessed 5 May 2015
  2. 2.
    Allen, B., Curless, B., Popović, Z.: The space of human body shapes: reconstruction and parameterization from range scans. ACM Trans. Graph. 22 (3), 587–594 (2003). doi:10.1145/1201775.882311CrossRefGoogle Scholar
  3. 3.
    Baker, A.: A biomechanical tongue model for speech production based on MRI live speaker data (2011). Accessed 5 May 2015Google Scholar
  4. 4.
    Blandin, R., Arnela, M., Laboissière, R., Pelorson, X., Guasch, O., Hirtum, A.V., Laval, X.: Effects of higher order propagation modes in vocal tract like geometries. J. Acoust. Soc. Am. 137 (2), 832–843 (2015). doi:10.1121/1.4906166CrossRefGoogle Scholar
  5. 5.
    Botsch, M., Kobbelt, L., Pauly, M., Alliez, P., Levy, B.: Polygon Mesh Processing. A K Peters/CRC Press, Natick (2010)CrossRefGoogle Scholar
  6. 6.
    Boykov, Y., Funka-Lea, G.: Graph cuts and efficient ND image segmentation. Int. J. Comput. Vis. 70 (2), 109–131 (2006). doi:10.1007/s11263-006-7934-5CrossRefGoogle Scholar
  7. 7.
    Brunton, A., Salazar, A., Bolkart, T., Wuhrer, S.: Review of statistical shape spaces for 3D data with comparative analysis for human faces. Comput. Vis. Image Underst. 128, 1–17 (2014). doi:10.1016/j.cviu.2014.05.005CrossRefGoogle Scholar
  8. 8.
    Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10 (2), 266–277 (2001). doi:10.1109/83.902291zbMATHCrossRefGoogle Scholar
  9. 9.
    Engwall, O.: Can audio-visual instructions help learners improve their articulation? – an ultrasound study of short term changes. In: 9th Annual Conference of the International Speech Communication Association (Interspeech), Brisbane, pp. 2631–2634 (2008)Google Scholar
  10. 10.
    Eryildirim, A., Berger, M.O.: A guided approach for automatic segmentation and modeling of the vocal tract in MRI images. In: European Signal Processing Conference (EUSIPCO), Barcelona, pp. 61–65 (2011)Google Scholar
  11. 11.
    Grady, L.: Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28 (11), 1768–1783 (2006). doi:10.1109/TPAMI.2006.233CrossRefGoogle Scholar
  12. 12.
    Harandi, N.M., Abugharbieh, R., Fels, S.: 3D segmentation of the tongue in MRI: a minimally interactive model-based approach. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 3 (4), 178–188 (2015). doi:10.1080/21681163.2013.864958Google Scholar
  13. 13.
    Hewer, A., Steiner, I., Wuhrer, S.: A hybrid approach to 3D tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation. In: 15th Annual Conference of the International Speech Communication Association (Interspeech), Singapore, pp. 418–421 (2014)Google Scholar
  14. 14.
    Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: Eurographics Symposium on Geometry Processing (SGP), Cagliari, pp. 61–70 (2006). doi:10.2312/SGP/SGP06/061-070Google Scholar
  15. 15.
    Ladefoged, P.: A Course in Phonetics, 2nd edn. Harcourt Brace Jovanovich, New York (1982)Google Scholar
  16. 16.
    Lee, J., Woo, J., Xing, F., Murano, E.Z., Stone, M., Prince, J.L.: Semi-automatic segmentation of the tongue for 3D motion analysis with dynamic MRI. In: IEEE 10th International Symposium on Biomedical Imaging (ISBI), San Francisco, pp. 1465–1468 (2013). doi:10.1109/ISBI.2013.6556811Google Scholar
  17. 17.
    Li, C., Kao, C.Y., Gore, J.C., Ding, Z.: Implicit active contours driven by local binary fitting energy. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, pp. 1–7 (2007). doi:10.1109/CVPR.2007.383014Google Scholar
  18. 18.
    Li, H., Adams, B., Guibas, L.J., Pauly, M.: Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28 (5), 175:1–175:10 (2009). doi:10.1145/1618452.1618521Google Scholar
  19. 19.
    Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45 (1–3), 503–528 (1989). doi:10.1007/BF01589116MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Liu, J., Udupa, J.K.: Oriented active shape models. IEEE Trans. Med. Imaging 28 (4), 571–584 (2009). doi:10.1109/TMI.2008.2007820CrossRefGoogle Scholar
  21. 21.
    Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79 (1), 12–49 (1988). doi:10.1016/0021-9991(88)90002-2MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Peng, T., Kerrien, E., Berger, M.O.: A shape-based framework to segmentation of tongue contours from MRI data. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Dallas, pp. 662–665 (2010). doi:10.1109/ICASSP.2010.5495123Google Scholar
  23. 23.
    Raeesy, Z., Rueda, S., Udupa, J.K., Coleman, J.: Automatic segmentation of vocal tract MR images. In: IEEE 10th International Symposium on Biomedical Imaging (ISBI), San Francisco, pp. 1328–1331 (2013). doi:10.1109/ISBI.2013.6556777Google Scholar
  24. 24.
    Witten, D.M.: Penalized unsupervised learning with outliers. Stat. Interface 6 (2), 211–221 (2013). doi:10.4310/SII.2013.v6.n2.a5MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Wuhrer, S., Lang, J., Tekieh, M., Shu, C.: Finite element based tracking of deforming surfaces. Graph. Models 77, 1–17 (2015). doi:10.1016/j.gmod.2014.10.002CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alexander Hewer
    • 1
    • 2
    • 3
    Email author
  • Stefanie Wuhrer
    • 4
  • Ingmar Steiner
    • 2
    • 3
  • Korin Richmond
    • 5
  1. 1.Saarbrücken Graduate School of Computer ScienceSaarbrückenGermany
  2. 2.DFKI Language Technology LabSaarbrückenGermany
  3. 3.Cluster of Excellence Multimodal Computing and InteractionSaarland UniversitySaarbrückenGermany
  4. 4.INRIA Grenoble Rhône-AlpesSaint IsmierFrance
  5. 5.Centre for Speech Technology ResearchUniversity of EdinburghEdinburghUK

Personalised recommendations